lilio.calendar_shifter
Calendar shifter to create staggered calendars.
Module Contents
- lilio.calendar_shifter.resample(calendar: lilio.calendar.Calendar, input_data: xarray.Dataset) xarray.Dataset [source]
- lilio.calendar_shifter.resample(calendar: lilio.calendar.Calendar, input_data: xarray.DataArray) xarray.DataArray
- lilio.calendar_shifter.resample(calendar: lilio.calendar.Calendar, input_data: pandas.Series | pandas.DataFrame) pandas.DataFrame
Resample input data to the Calendar’s intervals.
Pass a pandas Series/DataFrame with a datetime axis, or an xarray DataArray/Dataset with a datetime coordinate called ‘time’. It will return the same object with the datetimes resampled onto the Calendar’s Index by binning the data into the Calendar’s intervals and calculating the mean of each bin.
The default behavior is to calculate the mean for each interval. However, many other statistics can be calculated, namely:
mean, min, max, median
std, var, ptp (peak-to-peak)
nanmean, nanmedian, nanstd, nanvar
sum, nansum, size, count_nonzero
“size” will compute the number of datapoints that are in each interval, and can be used to check if the input data is of a sufficiently high resolution to be resampled to the calendar.
Note: this function is intended for upscaling operations, which means the calendar frequency is larger than the original frequency of input data (e.g. freq is “7days” and the input is daily data). It supports downscaling operations but the user need to be careful since the returned values may contain “NaN”.
- Parameters:
calendar – Calendar object with either a map_year or map_to_data mapping.
input_data – Input data for resampling. For a Pandas object its index must be either a pandas.DatetimeIndex. An xarray object requires a dimension named ‘time’ containing datetime values.
how –
Which method for resampling should be used. Either a string or function. The following methods are supported as a string input:
mean, min, max, median std, var, ptp (peak-to-peak) nanmean, nanmedian, nanstd, nanvar sum, nansum, size, count_nonzero
Alternatively, a function can be passed. For example resample(how=np.mean).
- Raises:
UserWarning – If the calendar frequency is smaller than the frequency of input data
- Returns:
- Input data resampled based on the calendar frequency, similar data format as
given inputs.
Example
Assuming the input data is pd.DataFrame containing random values with index from 2021-11-11 to 2021-11-01 at daily frequency.
>>> import lilio >>> import pandas as pd >>> import numpy as np >>> cal = lilio.daily_calendar(anchor="12-31", length="180d") >>> time_index = pd.date_range("2019-01-01", "2022-01-01", freq="1d") >>> var = np.arange(len(time_index)) >>> input_data = pd.Series(var, index=time_index) >>> cal = cal.map_to_data(input_data) >>> bins = lilio.resample(cal, input_data) >>> bins anchor_year i_interval ... data is_target 0 2019 -1 ... 273.5 False 1 2019 1 ... 453.5 True 2 2020 -1 ... 639.5 False 3 2020 1 ... 819.5 True [4 rows x 5 columns]
- lilio.calendar_shifter.calendar_shifter(calendar: calendar_shifter.calendar, shift: str | dict) calendar_shifter.calendar [source]
Shift a Calendar instance by a given time offset.
Instead of shifting the anchor date, this function shifts two things in reference to the anchor date:
- target period(s): as a gap between the anchor date and the start of the first
target period
- precursor period(s): as a gap between the anchor date and the start of the
first precursor period
This way, the anchor year from the input calendar is maintained on the returned calendar. This is important for train-test splitting at later stages.
- Parameters:
calendar – a lilio.Calendar instance
shift – a pandas-like frequency string (e.g. “10d”, “2W”, or “3M”), or a pandas.DateOffset compatible dictionary such as {days=10}, {weeks=2}, or {months=1, weeks=2}
Example
Shift a calendar by a given dateoffset.
>>> import lilio >>> cal = lilio.Calendar(anchor="07-01") >>> cal.add_intervals("target", "7d") >>> cal.add_intervals("precursor", "7d", gap="14d") >>> cal.add_intervals("precursor", "7d", n=3) >>> cal_shifted = lilio.calendar_shifter.calendar_shifter(cal, "7d") >>> cal_shifted Calendar( anchor='07-01', allow_overlap=False, mapping=None, intervals=[ Interval(role='target', length='7d', gap={'days': 7}), Interval(role='precursor', length='7d', gap={'days': 7}), Interval(role='precursor', length='7d', gap='0d'), Interval(role='precursor', length='7d', gap='0d'), Interval(role='precursor', length='7d', gap='0d') ] )
- lilio.calendar_shifter.staggered_calendar(calendar: staggered_calendar.calendar, shift: str | dict, n_shifts: int) list[staggered_calendar.calendar] [source]
Create a staggered calendar list by shifting a calendar by an offset n-times.
- Parameters:
calendar – an lilio.Calendar instance
shift – a pandas-like frequency string (e.g. “10d”, “2W”, or “3M”), or a pandas.DateOffset compatible dictionary such as {days=10}, {weeks=2}, or {months=1, weeks=2}
n_shifts – strictly positive integer for the number of shifts
Example
Shift an input calendar n times by a given dateoffset and return a list of these shifted calendars.
>>> import lilio >>> cal = lilio.Calendar(anchor="07-01") >>> cal.add_intervals("target", "7d") >>> cal.add_intervals("precursor", "7d", gap="14d") >>> cal.add_intervals("precursor", "7d", n=3) >>> cal_shifted = lilio.calendar_shifter.staggered_calendar(cal, "7d", 1) >>> cal_shifted [Calendar( anchor='07-01', allow_overlap=False, mapping=None, intervals=[ Interval(role='target', length='7d', gap='0d'), Interval(role='precursor', length='7d', gap='14d'), Interval(role='precursor', length='7d', gap='0d'), Interval(role='precursor', length='7d', gap='0d'), Interval(role='precursor', length='7d', gap='0d') ] ), Calendar( anchor='07-01', allow_overlap=False, mapping=None, intervals=[ Interval(role='target', length='7d', gap={'days': 7}), Interval(role='precursor', length='7d', gap={'days': 7}), Interval(role='precursor', length='7d', gap='0d'), Interval(role='precursor', length='7d', gap='0d'), Interval(role='precursor', length='7d', gap='0d') ] )]
- lilio.calendar_shifter.calendar_list_resampler(cal_list: list, ds: xarray.Dataset, dim_name: str = 'step') xarray.Dataset [source]
Return a dataset, resampled to every calendar in a list of calendars.
The resampled calendars will be concatenated along a new dimension (with the default name ‘step’) into a single xarray Dataset.
- Parameters:
cal_list – List of calendars.
ds – Dataset to resample.
dim_name – The name of the new dimension that will be added to the output dataset.
- Returns:
Resampled xr.Dataset