A_eicu A_eicu ---> B_eicu A_eicu ----- C_meds_empty[ ]:::empty A_eicu ----- C_micro_empty["> A_eicu A_eicu ---> B_eicu A_eicu ----- C_meds_empty[ ]:::empty A_eicu ----- C_micro_empty["> A_eicu A_eicu ---> B_eicu A_eicu ----- C_meds_empty[ ]:::empty A_eicu ----- C_micro_empty[">

<aside> πŸ§‘πŸ»β€πŸ’»

reprodICU is designed as an efficient Extract, Transform, Load (ETL) pipeline, with multiple separately defined steps.

</aside>

%%{ init : { "curve" : "step" } }%%

flowchart LR
    subgraph A_extract
      A_eicu[A_extract_eicu.py]
      A_mimic3[A_extract_mimic3.py]
      A_mimic4[A_extract_mimic4.py]
      A_nwicu[A_extract_nwicu.py]
      A_hirid[AX_extract_hirid.py]
      A_sicdb[AX_extract_sicdb.py]
      A_umcdb[AX_extract_umcdb.py]
    end

    subgraph B_process
      B_eicu[B_process_eicu.py]
      B_mimic3[B_process_mimic3.py]
      B_mimic4[B_process_mimic4.py]
      B_nwicu[B_process_nwicu.py]
      B_hirid[BX_process_hirid.py]
      B_sicdb[BX_process_sicdb.py]
      B_umcdb[BX_process_umcdb.py]
    end

    subgraph C_harmonize
      C_diags[C_harmonize_diagnoses.py]
      C_meds[C_harmonize_medications.py]
      C_micro[C_harmonize_microbiology.py]
      C_info[C_harmonize_patient_information.py]
      C_procs[C_harmonize_procedures.py]
      C_time[C_harmonize_timeseries.py]
    end

  eICU-CRD ---> A_eicu
  A_eicu ---> B_eicu
  A_eicu ----- C_meds_empty[ ]:::empty
  A_eicu ----- C_micro_empty[ ]:::empty
  A_eicu ----- C_info_empty[ ]:::empty
  A_eicu ----- C_procs_empty[ ]:::empty
  B_eicu ----- C_diags_empty[ ]:::empty
  B_eicu ----- C_time_empty[ ]:::empty

  MIMIC-III ---> A_mimic3
  A_mimic3 ---> B_mimic3
  A_mimic3 ----- C_diags_empty
  A_mimic3 ----- C_meds_empty
  A_mimic3 ----- C_micro_empty
  A_mimic3 ----- C_info_empty
  A_mimic3 ----- C_procs_empty
  B_mimic3 ----- C_time_empty

  MIMIC-IV ---> A_mimic4
  A_mimic4 ---> B_mimic4
  A_mimic4 ----- C_diags_empty
  A_mimic4 ----- C_meds_empty
  A_mimic4 ----- C_micro_empty
  A_mimic4 ----- C_info_empty
  A_mimic4 ----- C_procs_empty
  B_mimic4 ----- C_time_empty

  NWICU ---> A_nwicu
  A_nwicu ---> B_nwicu
  A_nwicu ----- C_diags_empty
  A_nwicu ----- C_meds_empty
  A_nwicu ----- C_micro_empty
  A_nwicu ----- C_info_empty
  A_nwicu ----- C_procs_empty
  B_nwicu ----- C_time_empty

  HiRID ---> A_hirid
  A_hirid ---> B_hirid
  A_hirid ----- C_diags_empty
  A_hirid ----- C_meds_empty
  A_hirid ----- C_micro_empty
  A_hirid ----- C_info_empty
  A_hirid ----- C_procs_empty
  B_hirid ----- C_time_empty

  SICdb ---> A_sicdb
  A_sicdb ---> B_sicdb
  A_sicdb ----- C_diags_empty
  A_sicdb ----- C_meds_empty
  A_sicdb ----- C_micro_empty
  A_sicdb ----- C_info_empty
  A_sicdb ----- C_procs_empty
  B_sicdb ----- C_time_empty

  UMCdb ---> A_umcdb
  A_umcdb ---> B_umcdb
  A_umcdb ----- C_diags_empty
  A_umcdb ----- C_meds_empty
  A_umcdb ----- C_micro_empty
  A_umcdb ----- C_info_empty
  A_umcdb ----- C_procs_empty
  B_umcdb ----- C_time_empty

  C_meds_empty --- C_meds
  C_micro_empty --- C_micro
  C_info_empty --- C_info
  C_procs_empty --- C_procs
  C_diags_empty --- C_diags
  C_time_empty --- C_time

  C_time ---> V["πŸ«€ timeseries_vitals"]
  C_time ---> R["🫁 timeseries_respiratory"]
  C_time ---> L["πŸ§ͺ timeseries_labs"]
  C_time ---> I["πŸ’§ timeseries_intakeoutput"]
  C_time ---> E["♻️ timeseries_intakeoutput"]

  C_diags ---> diagnoses
  C_meds ---> medications
  C_micro ---> microbiology
  C_info ---> patient_information
  C_procs ---> procedures
  
  classDef empty width:0px

Extract

A_extract

β†’ https://github.com/CUB-CORR/reprodICU/tree/main/src/reprodICU/helpers/A_extract

Each source data set of the SOURCE DATASETS has it’s own custom extraction pipeline

Transform

B_process

β†’ https://github.com/CUB-CORR/reprodICU/tree/main/src/reprodICU/helpers/B_process

Each source data set of the SOURCE DATASETS has it’s own custom processing pipeline.

Processing includes multiple separate steps:

C_harmonize

β†’ https://github.com/CUB-CORR/reprodICU/tree/main/src/reprodICU/helpers/C_harmonize

Each final table of the TABLES has it’s own harmonization pipeline, with all timeseries tables harmonized commonly.