lilio.resampling
The implementation of the resampling methods for use with the Calendar.
Module Contents
- lilio.resampling.add_attrs(data: xarray.DataArray | xarray.Dataset, calendar: lilio.calendar.Calendar) None [source]
Update resampled xarray data with the Calendar’s attributes and provenance.
- class lilio.resampling.Calendar(anchor: str, allow_overlap: bool = False, mapping: None | _MappingYears | _MappingData | _MappingDataGreedy = None, intervals: None | list[Interval] = None)[source]
Build a calendar from scratch with basic construction elements.
Instantiate a basic container for building calendar using basic blocks.
This is a highly flexible calendar which allows the user to build their own calendar with the basic building blocks of target and precursor periods.
Users have the freedom to create calendar with customized intervals, gap between intervals, and even overlapped intervals. They need to manage the calendar themselves.
Some shorthand calendars, such as a daily_calendar, weekly_calendar and monthly_calendar are available in lilio.calendar_shorthands. These can be used to easily construct basic calendars with only a few parameters, but do not have the flexibility that this calendar builder module provides.
- Parameters:
anchor –
String denoting the anchor date. The following inputs are valid: - “MM-DD” for a month and day. E.g. “12-31”. - “MM” for only a month, e.g. “4” for March. - English names and abbreviations of months. E.g. “December” or
”jan”.
- ”Www” for a week number, e.g. “W05” for the fifth week of the
year.
- ”Www-D” for a week number plus day of week. E.g. “W01-4” for the
first thursday of the year.
allow_overlap – If overlapping intervals between years is allowed or not. Default behaviour is False, which means that anchor years will be skipped to avoid data being shared between anchor years.
mapping – Calendar mapping. Input in the form: (“years”, 2000, 2020) or (“data”, pd.Timestamp(“2000-01-01”), pd.Timestamp(“2020-01-01”)). The calendar mapping is usually set with the map_years or map_to_data methods.
intervals – A list of Interval objects that should be appended to the calendar when it is initialized.
Example
Instantiate a custom calendar and appending target/precursor periods.
>>> import lilio >>> calendar = lilio.Calendar(anchor="12-31") >>> calendar Calendar( anchor='12-31', allow_overlap=False, mapping=None, intervals=None )
- property anchor
- Return the anchor.
- property allow_overlap
- if overlapping intervals are allowed or not.
- Type:
Returns the allow_overlap
- property mapping: None | Literal['years', 'data', 'data-greedy']
Return the mapping of the calendar. Either None, “years”, or “data”.
- add_intervals(role: Literal['target', 'precursor'], length: str, gap: str = '0d', n: int = 1) None [source]
Add one or more intervals to the calendar.
The interval can be a target or a precursor, and can be defined by its length, a possible gap between this interval and the preceding interval.
- Parameters:
role – Either a ‘target’ or ‘precursor’ interval(s).
length – The length of the interval(s), in a format of ‘5d’ for five days, ‘2W’ for two weeks, or ‘1M’ for one month.
gap – The gap between this interval and the preceding target/precursor interval. Same format as the length argument.
n – The number of intervals which should be added to the calendar. Defaults to 1.
- map_years(start: int, end: int)[source]
Add a start and end year mapping to the calendar.
If the start and end years are the same, the intervals for only that single year are returned by calendar.get_intervals().
- Parameters:
start – The first year for which the calendar will be realized
end – The last year for which the calendar will be realized
- Returns:
The calendar mapped to the input start and end year.
- map_to_data(input_data: pandas.Series | pandas.DataFrame | xarray.Dataset | xarray.DataArray, safe: bool = True)[source]
Map the calendar to input data period.
Stores the first and last intervals of the input data to the calendar, so that the intervals can cover the data to the greatest extent.
- Parameters:
input_data – Input data for datetime mapping. Its index must be either pandas.DatetimeIndex, or an xarray time coordinate with datetime data.
safe – bool describing if data should be mapped in safe (makes sure intervals are data-filled) or greedy mode (interval created if there is any data), safe is default.
- Returns:
The calendar mapped to the input data period.
- get_intervals() pandas.DataFrame [source]
Retrieve updated intervals from the Calendar object.
- show() pandas.DataFrame [source]
Display the intervals the Calendar will generate for the current setup.
- Returns:
Dataframe containing the calendar intervals.
- Return type:
pd.Dataframe
- visualize(n_years: int = 3, interactive: bool = False, relative_dates: bool = False, show_length: bool = False, add_legend: bool = True, ax=None, **bokeh_kwargs) None [source]
Plot a visualization of the current calendar setup, to aid in user setup.
Note: The interactive visualization requires the bokeh package to be installed in the active Python environment.
- Parameters:
n_years – Sets the maximum number of anchor years that should be shown. By default only the most recent 3 are visualized, to ensure that they fit within the plot.
interactive – If False, matplotlib will be used for the visualization. If True, bokeh will be used.
relative_dates – Toggles if the intervals should be displayed relative to the anchor date, or as absolute dates.
show_length – Toggles if the frequency of the intervals should be displayed. Defaults to False (Matplotlib plotter only).
add_legend – Toggles if a legend should be added to the plot (Matplotlib only)
ax – Matplotlib axis object to plot the visualization into.
**bokeh_kwargs – Keyword arguments to pass to Bokeh’s plotting.figure. See https://docs.bokeh.org/en/latest/docs/reference/plotting/figure.html for a list of possible keyword arguments.
- property flat: pandas.DataFrame
Returns the flattened intervals.
- lilio.resampling.resample(calendar: lilio.calendar.Calendar, input_data: xarray.Dataset) xarray.Dataset [source]
- lilio.resampling.resample(calendar: lilio.calendar.Calendar, input_data: xarray.DataArray) xarray.DataArray
- lilio.resampling.resample(calendar: lilio.calendar.Calendar, input_data: pandas.Series | pandas.DataFrame) pandas.DataFrame
Resample input data to the Calendar’s intervals.
Pass a pandas Series/DataFrame with a datetime axis, or an xarray DataArray/Dataset with a datetime coordinate called ‘time’. It will return the same object with the datetimes resampled onto the Calendar’s Index by binning the data into the Calendar’s intervals and calculating the mean of each bin.
The default behavior is to calculate the mean for each interval. However, many other statistics can be calculated, namely:
mean, min, max, median
std, var, ptp (peak-to-peak)
nanmean, nanmedian, nanstd, nanvar
sum, nansum, size, count_nonzero
“size” will compute the number of datapoints that are in each interval, and can be used to check if the input data is of a sufficiently high resolution to be resampled to the calendar.
Note: this function is intended for upscaling operations, which means the calendar frequency is larger than the original frequency of input data (e.g. freq is “7days” and the input is daily data). It supports downscaling operations but the user need to be careful since the returned values may contain “NaN”.
- Parameters:
calendar – Calendar object with either a map_year or map_to_data mapping.
input_data – Input data for resampling. For a Pandas object its index must be either a pandas.DatetimeIndex. An xarray object requires a dimension named ‘time’ containing datetime values.
how –
Which method for resampling should be used. Either a string or function. The following methods are supported as a string input:
mean, min, max, median std, var, ptp (peak-to-peak) nanmean, nanmedian, nanstd, nanvar sum, nansum, size, count_nonzero
Alternatively, a function can be passed. For example resample(how=np.mean).
- Raises:
UserWarning – If the calendar frequency is smaller than the frequency of input data
- Returns:
- Input data resampled based on the calendar frequency, similar data format as
given inputs.
Example
Assuming the input data is pd.DataFrame containing random values with index from 2021-11-11 to 2021-11-01 at daily frequency.
>>> import lilio >>> import pandas as pd >>> import numpy as np >>> cal = lilio.daily_calendar(anchor="12-31", length="180d") >>> time_index = pd.date_range("2019-01-01", "2022-01-01", freq="1d") >>> var = np.arange(len(time_index)) >>> input_data = pd.Series(var, index=time_index) >>> cal = cal.map_to_data(input_data) >>> bins = lilio.resample(cal, input_data) >>> bins anchor_year i_interval ... data is_target 0 2019 -1 ... 273.5 False 1 2019 1 ... 453.5 True 2 2020 -1 ... 639.5 False 3 2020 1 ... 819.5 True [4 rows x 5 columns]