Clinical Data Engineering · GxP · CDISC SDTM/ADaM
Automation for regulated clinical data
Senior Statistical Programmer building tested, auditable Python automation across the clinical data flow — from collected trial data to a regulatory submission.
Everything here is designed and built by me, actively in development and continually improving. None of these tools are deployed in a production or client environment — any demo runs on synthetic data only.
// the work
A CDISC automation suite, mapped to the pipeline it covers
Each tool automates one step from collected trial data to submission. Tap a stage to see which tools work there.
// the regulatory pipeline · RAW/EDC → SDTM → ADaM → TFL → DEFINE
// how I solve problems
Case study — SDTM Completeness
The arc I run on every tool: spot the manual pain, design the architecture, automate it, and measure the result.
Before → After
| Area | Before | After |
|---|---|---|
| Completeness review | Manual eyeballing across domains in Excel | One automated check run with a results table |
| Large-study runtime | Stalled — an O(n²) routine | Single linear pass timing: confirm |
| Logging on Citrix | Network I/O stall writing to a UNC path | Local buffered logging, no stall |
| Coverage checks | None | L5_COV_007 site-change · L5_COV_008 outliers |
| Progress visibility | Nothing until the run finished | Live dashboard with per-domain progress |
The arc
Manual process
Reviewers checked completeness and coverage by hand across every SDTM domain in spreadsheets.
Pain point
Slow and inconsistent. On large studies an O(n²) routine stalled, and logging to a Citrix UNC path hung on network I/O.
Tool architecture
A modular check framework: decorator-registered checks → domain registry → execution engine → reporting layer.
Automation
Every completeness and coverage check runs in one pass with a live progress dashboard; new checks plug in without touching the engine.
Measured benefit
Removed the quadratic bottleneck and the I/O stall; added site-change and outlier detection that did not exist before.
The result
Linear runtime, no I/O stalls, and two new coverage checks. Add the real before/after timing here — never estimate it.
About
Study Lead and Senior Statistical Programmer building automation that makes locked-down GxP environments faster without breaking traceability. Deep CDISC work in SAS and Python across oncology, virology, cardiovascular and early-phase trials.
Read more →