GENERAL
There are two important parts to the workflow:
- Cohort Identification
- Goal: Define and identify the relevant cohort to analyse.
- Apply inclusion and exclusion criteria as per the analysis plan.
- Document the cohort selection process for transparency and reproducibility.
- Analysis
- Goal: Execute relevant analyses.
- Implement statistical models, visualizations, and other analytical methods.
- Include comments and documentation for clarity and reproducibility.
Although it might seem self-explanatory, it helps to think the two things separately, as reprodICU is explicitly designed not to restrict itself to Python only.
The tutorials (and most support) for reprodICU will be given when working in Python, however, if wanted, one could identify the relevant patients in reprodICU, aggregate the relevant variables, export them to CSV (or another format) and work with the data in R, Stata, SPSS or whatever software is preferred.
TODO: LINK TO JUPYTER NOTEBOOK
TODO: LINK TO MARIMO NOTEBOOK
1. Include / Exclude Patients
- Create boolean masks on the Global ICU Stay ID in the table to create a set of included patients
- Boolean masks may in practice be created in a step-down procedure (i.e. evaluate only the patients that successfully passed the previous selection criteria for computational efficiency), however the underlying code should be independent of the order of inclusion/exclusion operations.
2. Determine Exposure
- Define one or multiple
concepts for the relevant exposure for the study
(relevant code should be written in a way that allows for pre-computation on the complete dataset!)
3. Determine Covariates
- Define / use established concepts for relevant covariates for the study
- Common covariates such as Elixhauser Comorbidity Index are / should be made available as precalculated dataframes for the complete dataset.
4. Aggregate the Data
- Include / exclude based on step 1
- Calculate step 2