lilio: Calendar generator for machine learning with timeseries dataο
A python package for generating calendars to resample timeseries into training and target data for machine learning. Named after the inventor of the Gregorian Calendar.
Lilio was originally designed for use in s2spy
, a high-level python package integrating expert knowledge and artificial intelligence to boost (sub) seasonal forecasting.
Installationο
To install the latest release of lilio, do:
python3 -m pip install lilio
Lilio is also available on conda-forge. If you use conda, do:
conda install -c conda-forge lilio
To install the in-development version from the GitHub repository, do:
python3 -m pip install git+https://github.com/AI4S2S/lilio.git
Configure the package for development and testingο
A more extensive developer guide can be found here.
The testing framework used here is pytest. Before running the test, we get a local copy of the source code and install lilio
via the command:
git clone https://github.com/AI4S2S/lilio.git
cd lilio
python3 -m pip install -e .[dev]
Then, run tests:
hatch run test
How the lilio calendars workο
In Lilio, calendars are 2-dimensional. Each row (year) represents a unique observation, whereas each column corresponds to a precursor period with a certain lag. This is how we like to structure our data for ML applications.
We define the βanchor dateβ to be between the target and precursor periods (strictly speaking, it is the start of the first target interval). All other intervals are expressed as offsets to this anchor date. Conveniently, this eliminates any ambiguity related to leap years.
Hereβs a calendar generated with Lilio:
>>> calendar = lilio.daily_calendar(anchor="11-30", length='180d')
>>> calendar = calendar.map_years(2020, 2021)
>>> calendar.show()
i_interval -1 1
anchor_year
2021 [2021-06-03, 2021-11-30) [2021-11-30, 2022-05-29)
2020 [2020-06-03, 2020-11-30) [2020-11-30, 2021-05-29)
Now, the user can load the data input_data
(e.g. pandas
DataFrame
) and resample it to the desired timescales configured in the calendar:
>>> calendar = calendar.map_to_data(input_data)
>>> bins = lilio.resample(calendar, input_data)
>>> bins
anchor_year i_interval interval mean_data target
0 2020 -1 [2020-06-03, 2020-11-30) 275.5 True
1 2020 1 [2020-11-30, 2021-05-29) 95.5 False
2 2021 -1 [2021-06-03, 2021-11-30) 640.5 True
3 2021 1 [2021-11-30, 2022-05-29) 460.5 False
For convenience, Lilio offers a few shorthands for standard of calendars e.g.
monthly_calendar
and
weekly_calendar
.
However, you can also create custom calendars by calling
Calendar
directly. For a nice walkthrough, see this example
notebook.
Documentationο
For detailed information on using lilio
package, visit the documentation page hosted at Readthedocs.
Contributingο
If you want to contribute to the development of lilio, have a look at the contribution guidelines.
How to cite usο
Please use the Zenodo DOI to cite this package if you used it in your research.
Acknowledgementsο
This package was developed by the Netherlands eScience Center and Vrije Universiteit Amsterdam under Netherlands eScience Center grant NLESC.OEC.2021.005.
The package was created with Cookiecutter and the NLeSC/python-template.