← All projects

A multi-page Streamlit dashboard that loads SDTM in several formats, derives an analysis population, and offers domain inspection, per-subject profiles, CSR-quality safety and efficacy outputs, and a NetworkX knowledge graph across domains.

What it adds

Pinnacle 21 reports issues; it does not let you explore the study. This turns SDTM into an interactive review surface — per-subject profiles, CSR-quality outputs, and cross-domain queries.

How it works

InputsSDTM XPT / SAS7BDAT / CSVstudy_config.yaml
Process1Multi-format loader2Population derivation (ADSL, flags)3Analysis: safety / labs / VS / efficacy4NetworkX knowledge graph
OutputsPer-subject profilesCSR-quality TFLsExcel / CSV / RTF export

Typical layout

SDTM ExplorerSIDEBARLoad dataPopulationTheme toggleOverviewSubjectAELabsEfficacyDomain overview + conformance5-panel patient profileKM / waterfall / forest plots

By the numbers

11
Streamlit pages
3
Input formats
4
Analysis areas
v5
Version

Data flow

Reviewing a study's SDTM data means jumping between domains, subjects, and standard outputs. Doing that across spreadsheets and ad-hoc programs is slow and gives no single place to explore the whole study.

Input: SDTM datasets (XPT / SAS7BDAT / CSV) + study_config.yaml
        |
        v
  Loader (data/loader.py)            robust multi-format SDTM loader
        |
        v
  Derivations (derivations/populations.py)
        |                            ADSL, SAFFL/ITTFL, CHG/PCHG, study day
        v
  Analysis layer
        |   safety.py     AE n/N, TEAE, CTCAE norm, shift tables
        |   descriptive.py  continuous/categorical stats, KM, subgroup
        |   plots.py      Plotly visuals (theme-aware)
        |   graph/builder.py  NetworkX inter-domain graph
        v
  11 Streamlit pages  -->  overview, subject, TFLs, AE, LB, VS, efficacy,
                           patient profile, graph-RAG, export/audit

Engineering trade-offs

Population derivation centralised in one module
ADSL, flags, CHG/PCHG and study day are derived once so every page shares the same analysis population.
NetworkX knowledge graph across domains
Links records across domains for a subject so cross-domain questions are a graph query, not a manual join.
Theme-aware Plotly with a dark/light toggle
The same plotting layer serves screen review and exportable outputs without duplicate code.
Defensive loader with a demo-data fallback
Handles XPT, SAS7BDAT and CSV and degrades gracefully, so a reviewer can explore even with partial data.

At a glance

A quick visual read of the countable facts; full detail in the table.

Streamlit pages11
Analysis areas4
Input formats3

Relative scale · values labelled · unit: count

Processing characteristics

MetricValueNotes
Streamlit pages11Overview through export/audit
Input formatsXPT, SAS7BDAT, CSVRobust multi-format loader
Analysis areasSafety, labs, VS, efficacyn/N, TEAE, shift tables, KM, waterfall, forest
GraphNetworkXInter-domain knowledge graph + queries
ML helpersOutlier / cluster / risksrc/ml outlier, clusterer, risk scorer
ExportExcel / CSV / RTFWith an audit trail page

Functional wins

01Loads SDTM in XPT, SAS7BDAT or CSV and derives a consistent analysis population shared across every page.
02Builds a 5-panel patient profile timeline (phases, EX, AE, LB, TR) for per-subject review.
03Produces CSR-quality safety and efficacy outputs: AE n/N and TEAE, lab shift tables, KM curves, waterfall and forest plots.
04Links records across domains in a NetworkX knowledge graph so cross-domain questions become graph queries.

Module dependencies

core
  • Python
  • pyyaml
ui
  • streamlit
  • plotly
  • kaleido
data
  • pandas
  • numpy
  • pyreadstat
  • openpyxl
ml
  • scipy
  • statsmodels
  • lifelines
  • networkx