AGGREGATE.py | Notion

AGGREGATE.py requires JSON / YAML files as input
the dictionaries contained therein are processed top-to-bottom, thus references to previous concepts defined within the same JSON / YAML file may be made
the expressions for filter / expression may be almost arbitrarily complicated

JSON / YAML files for input in the `AGGREGATE.py` function should follow the following structure:

variable name: the name of the final column
- type: dynamic or static, relevant for aggregation
- table: the table the data should be extracted from
- variable: if dynamic, this is the column to select from table (along with Time Relative to Admission (seconds)), if static this is the column to aggregate on
- variable_sources: if variable is a lab value, select the wanted sources (put null if None shall be included)
- value_dtype: e.g. set to bool to get a binary variable
- cutoff: cutoffs for static variables are applied separately (i.e. locally for that var) from cutoffs for dynamic variables (globally for the full dataframe)
  - value: low (lo) / high (hi) cutoffs for the value, the value is clipped to these cutoffs (data is not removed); if one of the cutoffs isn’t set, it is assumed to be None
  - time: low (lo) / high (hi) cutoffs for the time, the time series is filtered to between these timepoints (data is removed); if one of the cutoffs isn’t set, it is assumed to be None.
- time_col: column to be used as time reference (Time Relative to Admission (seconds) if not specified)
- aggregation: method to use for aggregation, one of sum, mean, median, max, min, first, last, count
- sort: value to sort by for aggregation (Time Relative to Admission (seconds) if not specified)
- group_by: value(s) to group by for aggregation (i.e., columns to be used in addition to Global ICU Stay ID)
- requires: list of variables required (e.g. if there is a filter or expression)
- filter: expression for filtering the data (has to include the columns to filter as strings within strings, e.g. "pl.col('power') > 9000")
- prefilter: expression for filtering the table the data should be extracted from (has to include the columns to filter as strings within strings, e.g. "pl.col('power') > 9000")
- expression: expression to calculate using the data (has to include the columns used for calculation as strings within strings, e.g. "pl.col('a_squared') + pl.col('a_squared')")
- keep: bool, whether to keep the variable in the output frame

AN EXAMPLE

{
    "blood_sodium": {
        "type": "dynamic",
        "table": "timeseries_labs",
        "variable": "Sodium [Moles/volume] in Blood",
        "value_dtype": "float",
        "cutoff": {
            "value": {
                "lo": 80,
                "hi": 190
            },
            "time": {
                "lo": 0
            }
        },
        "keep": false
    },
    "first_hypernatremia_recordtime": {
        "type": "static",
        "requires": [
            "blood_sodium"
        ],
        "variable": "Time Relative to Admission (seconds)",
        "aggregation": "min",
        "filter": "pl.col('blood_sodium') > 145"
    },
    "blood_sodium_record_count": {
        "type": "static",
        "requires": [
            "blood_sodium"
        ],
        "variable": "blood_sodium",
        "aggregation": "count"
    }
}

the above is equivalent to the following YAML file

… and results in the following exemplary DataFrame:

│ Global ICU Stay ID ┆ blood_sodium_record_count ┆ first_hypernatremia_recordtime │
│ ---                ┆ ---                       ┆ ---                            │
│ str                ┆ u32                       ┆ f64                            │
╞════════════════════╪═══════════════════════════╪════════════════════════════════╡
│ eicu-1000020       ┆ 3                         ┆ 35100.0                        │
│ eicu-1000050       ┆ null                      ┆ null                           │
│ eicu-1000071       ┆ null                      ┆ null                           │
│ eicu-1000105       ┆ 3                         ┆ 113040.0                       │
│ eicu-1000106       ┆ null                      ┆ null                           │

JSON / YAML files for input in the AGGREGATE.py function should follow the following structure:

AN EXAMPLE

JSON / YAML files for input in the `AGGREGATE.py` function should follow the following structure: