lilio
lilio: Calendar generator for machine learning with timeseries data.
Time indices are anchored to the target period of interest. By keeping observations from the same cycle (typically 1 year) together and paying close attention to the treatment of adjacent cycles, we avoid information leakage between train and test sets.
Example
Countdown the 4 weeks until New Year’s Eve
>>> import lilio
>>> calendar = lilio.Calendar(anchor="12-31")
>>> calendar.add_intervals("target", "1d")
>>> calendar.add_intervals("precursor", "4W", n=4)
>>> calendar
Calendar(
anchor='12-31',
allow_overlap=False,
mapping=None,
intervals=[
Interval(role='target', length='1d', gap='0d'),
Interval(role='precursor', length='4W', gap='0d'),
Interval(role='precursor', length='4W', gap='0d'),
Interval(role='precursor', length='4W', gap='0d'),
Interval(role='precursor', length='4W', gap='0d')
]
)
Get the 180-day periods leading up to New Year’s eve for the year 2020
>>> calendar = lilio.daily_calendar(anchor="12-31", length="180d")
>>> calendar = calendar.map_years(2020, 2020)
>>> calendar.show()
i_interval -1 1
anchor_year
2020 [2020-07-04, 2020-12-31) [2020-12-31, 2021-06-29)
Get the 180-day periods leading up to New Year’s eve for 2020 - 2022 inclusive.
>>> calendar = lilio.daily_calendar(anchor="12-31", length="180d")
>>> calendar = calendar.map_years(2020, 2022)
>>> # note the leap year:
>>> calendar.show()
i_interval -1 1
anchor_year
2022 [2022-07-04, 2022-12-31) [2022-12-31, 2023-06-29)
2021 [2021-07-04, 2021-12-31) [2021-12-31, 2022-06-29)
2020 [2020-07-04, 2020-12-31) [2020-12-31, 2021-06-29)
To get a stacked representation:
>>> calendar.map_years(2020, 2022).flat
anchor_year i_interval
2022 -1 [2022-07-04 00:00:00, 2022-12-31 00:00:00)
1 [2022-12-31 00:00:00, 2023-06-29 00:00:00)
2021 -1 [2021-07-04 00:00:00, 2021-12-31 00:00:00)
1 [2021-12-31 00:00:00, 2022-06-29 00:00:00)
2020 -1 [2020-07-04 00:00:00, 2020-12-31 00:00:00)
1 [2020-12-31 00:00:00, 2021-06-29 00:00:00)
dtype: interval
Submodules
Package Contents
- class lilio.Calendar(anchor: str, allow_overlap: bool = False, mapping: None | _MappingYears | _MappingData = None, intervals: None | list[Interval] = None)[source]
Build a calendar from scratch with basic construction elements.
Instantiate a basic container for building calendar using basic blocks.
This is a highly flexible calendar which allows the user to build their own calendar with the basic building blocks of target and precursor periods.
Users have the freedom to create calendar with customized intervals, gap between intervals, and even overlapped intervals. They need to manage the calendar themselves.
Some shorthand calendars, such as a daily_calendar, weekly_calendar and monthly_calendar are available in lilio.calendar_shorthands. These can be used to easily construct basic calendars with only a few parameters, but do not have the flexibility that this calendar builder module provides.
- Parameters:
anchor –
String denoting the anchor date. The following inputs are valid: - “MM-DD” for a month and day. E.g. “12-31”. - “MM” for only a month, e.g. “4” for March. - English names and abbreviations of months. E.g. “December” or
”jan”.
- ”Www” for a week number, e.g. “W05” for the fifth week of the
year.
- ”Www-D” for a week number plus day of week. E.g. “W01-4” for the
first thursday of the year.
allow_overlap – If overlapping intervals between years is allowed or not. Default behaviour is False, which means that anchor years will be skipped to avoid data being shared between anchor years.
mapping – Calendar mapping. Input in the form: (“years”, 2000, 2020) or (“data”, pd.Timestamp(“2000-01-01”), pd.Timestamp(“2020-01-01”)). The calendar mapping is usually set with the map_years or map_to_data methods.
intervals – A list of Interval objects that should be appended to the calendar when it is initialized.
Example
Instantiate a custom calendar and appending target/precursor periods.
>>> import lilio >>> calendar = lilio.Calendar(anchor="12-31") >>> calendar Calendar( anchor='12-31', allow_overlap=False, mapping=None, intervals=None )
- property anchor
Return the anchor.
- property allow_overlap
if overlapping intervals are allowed or not.
- Type:
Returns the allow_overlap
- property mapping: None | Literal[years, data]
Return the mapping of the calendar. Either None, “years”, or “data”.
- property flat: pandas.DataFrame
Returns the flattened intervals.
- add_intervals(role: Literal[target, precursor], length: str, gap: str = '0d', n: int = 1) None [source]
Add one or more intervals to the calendar.
The interval can be a target or a precursor, and can be defined by its length, a possible gap between this interval and the preceding interval.
- Parameters:
role – Either a ‘target’ or ‘precursor’ interval(s).
length – The length of the interval(s), in a format of ‘5d’ for five days, ‘2W’ for two weeks, or ‘1M’ for one month.
gap – The gap between this interval and the preceding target/precursor interval. Same format as the length argument.
n – The number of intervals which should be added to the calendar. Defaults to 1.
- map_years(start: int, end: int)[source]
Add a start and end year mapping to the calendar.
If the start and end years are the same, the intervals for only that single year are returned by calendar.get_intervals().
- Parameters:
start – The first year for which the calendar will be realized
end – The last year for which the calendar will be realized
- Returns:
The calendar mapped to the input start and end year.
- map_to_data(input_data: pandas.Series | pandas.DataFrame | xarray.Dataset | xarray.DataArray)[source]
Map the calendar to input data period.
Stores the first and last intervals of the input data to the calendar, so that the intervals can cover the data to the greatest extent.
- Parameters:
input_data – Input data for datetime mapping. Its index must be either pandas.DatetimeIndex, or an xarray time coordinate with datetime data.
- Returns:
The calendar mapped to the input data period.
- get_intervals() pandas.DataFrame [source]
Retrieve updated intervals from the Calendar object.
- show() pandas.DataFrame [source]
Display the intervals the Calendar will generate for the current setup.
- Returns:
Dataframe containing the calendar intervals.
- Return type:
pd.Dataframe
- visualize(n_years: int = 3, interactive: bool = False, relative_dates: bool = False, show_length: bool = False, add_legend: bool = True, ax=None, **bokeh_kwargs) None [source]
Plot a visualization of the current calendar setup, to aid in user setup.
Note: The interactive visualization requires the bokeh package to be installed in the active Python environment.
- Parameters:
n_years – Sets the maximum number of anchor years that should be shown. By default only the most recent 3 are visualized, to ensure that they fit within the plot.
interactive – If False, matplotlib will be used for the visualization. If True, bokeh will be used.
relative_dates – Toggles if the intervals should be displayed relative to the anchor date, or as absolute dates.
show_length – Toggles if the frequency of the intervals should be displayed. Defaults to False (Matplotlib plotter only).
add_legend – Toggles if a legend should be added to the plot (Matplotlib only)
ax – Matplotlib axis object to plot the visualization into.
**bokeh_kwargs – Keyword arguments to pass to Bokeh’s plotting.figure. See https://docs.bokeh.org/en/latest/docs/reference/plotting/figure.html for a list of possible keyword arguments.
- class lilio.Interval(role: Literal[target, precursor], length: str | dict, gap: str | dict = '0d')[source]
Basic construction element of calendar for defining precursors and targets.
Construct the basic element of the calendar.
The Interval is characterised by its type (either target or precursor), its length and the gap between it and the previous interval of its type (or the anchor date, if the interval is the first target/first precursor).
- Parameters:
role – The type of interval. Either “target” or “precursor”.
length – The length of the interval. This can either be a pandas-like frequency string (e.g. “10d”, “2W”, or “3M”), or a pandas.DateOffset compatible dictionary such as {days=10}, {weeks=2}, or {months=1, weeks=2}.
gap – The gap between the previous interval and this interval. Valid inputs are the same as the length keyword argument. Defaults to “0d”.
Example
>>> from lilio import Interval >>> iv = Interval("target", length="7d") >>> iv Interval(role='target', length='7d', gap='0d')
You can modify the interval’s properties in-place:
>>> iv.gap = "1W" >>> iv Interval(role='target', length='7d', gap='1W')
- property is_target
Return whether this Intervals is a target interval.
- property role
Return the type of interval.
- property length
Return the length of the interval, as a pandas.DateOffset.
- property length_dateoffset
Return the length property as a dateoffset.
- property gap
Returns the gap of the interval, as a pandas.DateOffset.
- property gap_dateoffset
Get the gap property as a dateoffset.
- lilio.daily_calendar(anchor: str, length: str = '1d', n_targets: int = 1, n_precursors: int = 0, allow_overlap: bool = False) lilio.calendar.Calendar [source]
Instantiate a basic daily calendar with minimal configuration.
Set up a quick calendar revolving around intervals with day-based lengths. The intervals will extend back in time with as many intervals as fit within the cycle time of one year.
- Parameters:
anchor – String in the form “12-31” for December 31st. The first target interval will contain the anchor, while the precursor intervals are built back in time starting at this date.
length – The length of every target and precursor period.
n_targets – integer specifying the number of target intervals in a period.
n_precursors – Sets the maximum number of precursors of the Calendar. If 0, the amount will be determined by how many fit in each anchor year. If a value is provided, the intervals can either only cover part of the year, or extend over multiple years. In case of a large max_lag number where the intervals extend over multiple years, anchor years will be skipped to avoid overlapping intervals. To allow overlapping intervals, use the allow_overlap kwarg.
allow_overlap – Allows intervals to overlap between anchor years, if the max_lag is set to a high enough number that intervals extend over multiple years. False by default, to avoid train/test information leakage.
- Returns:
An instantiated Calendar built according to the input kwarg specifications
Example
Instantiate a calendar counting towards Christmas in 3-days steps.
>>> import lilio >>> calendar = lilio.daily_calendar(anchor='12-25', length="3d", n_precursors=3) >>> calendar Calendar( anchor='12-25', allow_overlap=False, mapping=None, intervals=[ Interval(role='target', length='3d', gap='0d'), Interval(role='precursor', length='3d', gap='0d'), Interval(role='precursor', length='3d', gap='0d'), Interval(role='precursor', length='3d', gap='0d') ] )
- lilio.monthly_calendar(anchor: str, length: str = '1M', n_targets: int = 1, n_precursors: int = 0, allow_overlap: bool = False) lilio.calendar.Calendar [source]
Instantiate a basic monthly calendar with minimal configuration.
Set up a quick calendar revolving around intervals with month-based lengths. The intervals will extend back in time with as many intervals as fit within the cycle time of one year.
- Parameters:
anchor – Str in the form ‘January’ or ‘Jan’. he first target interval will contain the anchor, while the precursor intervals are built back in time starting at this Month.
length – The length of every target and precursor period, in the form ‘1M’, ‘2M’, etc.
n_targets – integer specifying the number of target intervals in a period.
n_precursors – Sets the maximum number of precursors of the Calendar. If 0, the amount will be determined by how many fit in each anchor year. If a value is provided, the intervals can either only cover part of the year, or extend over multiple years. In case of a large max_lag number where the intervals extend over multiple years, anchor years will be skipped to avoid overlapping intervals. To allow overlapping intervals, use the allow_overlap kwarg.
allow_overlap – Allows intervals to overlap between anchor years, if the max_lag is set to a high enough number that intervals extend over multiple years. False by default, to avoid train/test information leakage.
- Returns:
An instantiated Calendar built according to the input kwarg specifications
Example
Instantiate a calendar counting down the quarters (3 month periods) from december.
>>> import lilio >>> calendar = lilio.monthly_calendar(anchor='Dec', length="3M") >>> calendar Calendar( anchor='12', allow_overlap=False, mapping=None, intervals=[ Interval(role='target', length='3M', gap='0d'), Interval(role='precursor', length='3M', gap='0d'), Interval(role='precursor', length='3M', gap='0d'), Interval(role='precursor', length='3M', gap='0d') ] )
- lilio.weekly_calendar(anchor: str, length: str = '1W', n_targets: int = 1, n_precursors: int = 0, allow_overlap: bool = False) lilio.calendar.Calendar [source]
Instantiate a basic monthly calendar with minimal configuration.
Set up a quick calendar revolving around intervals with week-based lengths. The precursor intervals will extend back in time with as many intervals as fit within the cycle time of one year (i.e. 52 - n_targets).
Note that the difference between this calendar and the daily_calendar revolves around the use of calendar weeks (Monday - Sunday), instead of 7-day periods.
- Parameters:
anchor – Str in the form of “40W”, denoting the week number. The first target interval will contain the anchor, while the precursor intervals are built back in time starting from this week.
length – The length of every precursor and target interval, e.g. ‘2W’.
n_targets – integer specifying the number of target intervals in a period.
n_precursors – Sets the maximum number of precursors of the Calendar. If 0, the amount will be determined by how many fit in each anchor year. If a value is provided, the intervals can either only cover part of the year, or extend over multiple years. In case of a large max_lag number where the intervals extend over multiple years, anchor years will be skipped to avoid overlapping intervals. To allow overlapping intervals, use the allow_overlap kwarg.
allow_overlap – Allows intervals to overlap between anchor years, if the max_lag is set to a high enough number that intervals extend over multiple years. False by default, to avoid train/test information leakage.
- Returns:
An instantiated Calendar built according to the input kwarg specifications
Example
Instantiate a calendar counting down the quarters (3 month periods) from december.
>>> import lilio >>> calendar = lilio.weekly_calendar(anchor="W40", length="1W", n_precursors=2) >>> calendar Calendar( anchor='W40-1', allow_overlap=False, mapping=None, intervals=[ Interval(role='target', length='1W', gap='0d'), Interval(role='precursor', length='1W', gap='0d'), Interval(role='precursor', length='1W', gap='0d') ] )
- lilio.resample(calendar: lilio.calendar.Calendar, input_data: xarray.Dataset) xarray.Dataset [source]
- lilio.resample(calendar: lilio.calendar.Calendar, input_data: xarray.DataArray) xarray.DataArray
- lilio.resample(calendar: lilio.calendar.Calendar, input_data: pandas.Series | pandas.DataFrame) pandas.DataFrame
Resample input data to the Calendar’s intervals.
Pass a pandas Series/DataFrame with a datetime axis, or an xarray DataArray/Dataset with a datetime coordinate called ‘time’. It will return the same object with the datetimes resampled onto the Calendar’s Index by binning the data into the Calendar’s intervals and calculating the mean of each bin.
The default behavior is to calculate the mean for each interval. However, many other statistics can be calculated, namely:
mean, min, max, median
std, var, ptp (peak-to-peak)
nanmean, nanmedian, nanstd, nanvar
sum, nansum, size, count_nonzero
“size” will compute the number of datapoints that are in each interval, and can be used to check if the input data is of a sufficiently high resolution to be resampled to the calendar.
Note: this function is intended for upscaling operations, which means the calendar frequency is larger than the original frequency of input data (e.g. freq is “7days” and the input is daily data). It supports downscaling operations but the user need to be careful since the returned values may contain “NaN”.
- Parameters:
calendar – Calendar object with either a map_year or map_to_data mapping.
input_data – Input data for resampling. For a Pandas object its index must be either a pandas.DatetimeIndex. An xarray object requires a dimension named ‘time’ containing datetime values.
how –
Which method for resampling should be used. Either a string or function. The following methods are supported as a string input:
mean, min, max, median std, var, ptp (peak-to-peak) nanmean, nanmedian, nanstd, nanvar sum, nansum, size, count_nonzero
Alternatively, a function can be passed. For example resample(how=np.mean).
- Raises:
UserWarning – If the calendar frequency is smaller than the frequency of input data
- Returns:
- Input data resampled based on the calendar frequency, similar data format as
given inputs.
Example
Assuming the input data is pd.DataFrame containing random values with index from 2021-11-11 to 2021-11-01 at daily frequency.
>>> import lilio >>> import pandas as pd >>> import numpy as np >>> cal = lilio.daily_calendar(anchor="12-31", length="180d") >>> time_index = pd.date_range("20191201", "20211231", freq="1d") >>> var = np.arange(len(time_index)) >>> input_data = pd.Series(var, index=time_index) >>> cal = cal.map_to_data(input_data) >>> bins = lilio.resample(cal, input_data) >>> bins anchor_year i_interval ... data is_target 0 2019 -1 ... 14.5 False 1 2019 1 ... 119.5 True 2 2020 -1 ... 305.5 False 3 2020 1 ... 485.5 True [4 rows x 5 columns]