To capture the complex metadata of laboratory measurements while still keeping the structure of the reprodICU dataset simple, metadata is encoded in a so called struct.
structs can be thought of as dictionaries with key:value combinations, allowing us to keep the measurement values and its metadata within one singular column.
value: the actual value as a floating point numbersystem: the source of the specimen the measurement was taken from (one of — among others — Blood, Serum or Plasma, Urine, etc.)method: the method that was used for measuring (usually empty, sometimes e.g. by Manual count)time: the time aspect for the value (e.g. 24 hour urine)LOINC: the LOINC concept code referring to the specific lab testpl.struct() is used.df.unnest() / lf.unnest() is used..struct.json_encode() (and then decoded with .str.json_decode())# to access e.g. only the numerical values of "Base excess"
base_excess_values = (
pl.scan_parquet("timeseries_labs.parquet")
.select("Base excess")
.struct.field("value")
)
# to access e.g. only the numerical values of "Sodium [Moles/volume]"
# measured in arterial blood
sodium_arterial_values = (
pl.scan_parquet("timeseries_labs.parquet")
.select("Sodium")
.unnest("Sodium")
.filter(pl.col("system") == "Blood arterial")
)