lilio.resampling

The implementation of the resampling methods for use with the Calendar.

Module Contents

lilio.resampling.ResamplingMethod[source]
lilio.resampling.VALID_METHODS[source]
lilio.resampling.resample(calendar: lilio.calendar.Calendar, input_data: xarray.Dataset) xarray.Dataset[source]
lilio.resampling.resample(calendar: lilio.calendar.Calendar, input_data: xarray.DataArray) xarray.DataArray
lilio.resampling.resample(calendar: lilio.calendar.Calendar, input_data: pandas.Series | pandas.DataFrame) pandas.DataFrame

Resample input data to the Calendar’s intervals.

Pass a pandas Series/DataFrame with a datetime axis, or an xarray DataArray/Dataset with a datetime coordinate called ‘time’. It will return the same object with the datetimes resampled onto the Calendar’s Index by binning the data into the Calendar’s intervals and calculating the mean of each bin.

The default behavior is to calculate the mean for each interval. However, many other statistics can be calculated, namely:

  • mean, min, max, median

  • std, var, ptp (peak-to-peak)

  • nanmean, nanmedian, nanstd, nanvar

  • sum, nansum, size, count_nonzero

“size” will compute the number of datapoints that are in each interval, and can be used to check if the input data is of a sufficiently high resolution to be resampled to the calendar.

Note: this function is intended for upscaling operations, which means the calendar frequency is larger than the original frequency of input data (e.g. freq is “7days” and the input is daily data). It supports downscaling operations but the user need to be careful since the returned values may contain “NaN”.

Parameters:
  • calendar – Calendar object with either a map_year or map_to_data mapping.

  • input_data – Input data for resampling. For a Pandas object its index must be either a pandas.DatetimeIndex. An xarray object requires a dimension named ‘time’ containing datetime values.

  • how

    Which method for resampling should be used. Either a string or function. The following methods are supported as a string input:

    mean, min, max, median std, var, ptp (peak-to-peak) nanmean, nanmedian, nanstd, nanvar sum, nansum, size, count_nonzero

    Alternatively, a function can be passed. For example resample(how=np.mean).

Raises:

UserWarning – If the calendar frequency is smaller than the frequency of input data

Returns:

Input data resampled based on the calendar frequency, similar data format as

given inputs.

Example

Assuming the input data is pd.DataFrame containing random values with index from 2021-11-11 to 2021-11-01 at daily frequency.

>>> import lilio
>>> import pandas as pd
>>> import numpy as np
>>> cal = lilio.daily_calendar(anchor="12-31", length="180d")
>>> time_index = pd.date_range("20191201", "20211231", freq="1d")
>>> var = np.arange(len(time_index))
>>> input_data = pd.Series(var, index=time_index)
>>> cal = cal.map_to_data(input_data)
>>> bins = lilio.resample(cal, input_data)
>>> bins 
   anchor_year  i_interval  ...   data  is_target
0         2019          -1  ...   14.5      False
1         2019           1  ...  119.5       True
2         2020          -1  ...  305.5      False
3         2020           1  ...  485.5       True
[4 rows x 5 columns]