Clinical Data Engineering · GxP · CDISC SDTM/ADaM

Automation for regulated clinical data

Senior Statistical Programmer building tested, auditable Python automation across the clinical data flow — from collected trial data to a regulatory submission.

Personal projects

Everything here is designed and built by me, actively in development and continually improving. None of these tools are deployed in a production or client environment — any demo runs on synthetic data only.

// the work

A CDISC automation suite, mapped to the pipeline it covers

Each tool automates one step from collected trial data to submission. Tap a stage to see which tools work there.

// the regulatory pipeline · RAW/EDC → SDTM → ADaM → TFL → DEFINE

Showing all 8 tools

// how I solve problems

Case study — SDTM Completeness

The arc I run on every tool: spot the manual pain, design the architecture, automate it, and measure the result.

Before → After

AreaBeforeAfter
Completeness reviewManual eyeballing across domains in ExcelOne automated check run with a results table
Large-study runtimeStalled — an O(n²) routineSingle linear pass timing: confirm
Logging on CitrixNetwork I/O stall writing to a UNC pathLocal buffered logging, no stall
Coverage checksNoneL5_COV_007 site-change · L5_COV_008 outliers
Progress visibilityNothing until the run finishedLive dashboard with per-domain progress

The arc

1

Manual process

Reviewers checked completeness and coverage by hand across every SDTM domain in spreadsheets.

2

Pain point

Slow and inconsistent. On large studies an O(n²) routine stalled, and logging to a Citrix UNC path hung on network I/O.

3

Tool architecture

A modular check framework: decorator-registered checks → domain registry → execution engine → reporting layer.

4

Automation

Every completeness and coverage check runs in one pass with a live progress dashboard; new checks plug in without touching the engine.

5

Measured benefit

Removed the quadratic bottleneck and the I/O stall; added site-change and outlier detection that did not exist before.

The result

Linear runtime, no I/O stalls, and two new coverage checks. Add the real before/after timing here — never estimate it.

About

Study Lead and Senior Statistical Programmer building automation that makes locked-down GxP environments faster without breaking traceability. Deep CDISC work in SAS and Python across oncology, virology, cardiovascular and early-phase trials.

Read more →
10+
Years GxP
8
Tools built
5
Therapy areas
CDISC
SDTM · ADaM · TFL