← All projects

A Streamlit app over a layered check engine that validates traceability, domain rules, and value fidelity, runs checks with a per-check time budget, and shows a live progress dashboard plus Excel and HTML reports.

What it adds

A conformance checker validates the SDTM you have; it does not confirm SDTM is complete and traceable back to raw. This validates that linkage — record and subject parity, SUPP completeness, value fidelity — which conformance checking does not cover.

How it works

InputsRaw datasetsSDTM datasetsMaster mapping spec
Process1Loaders + join engine2Layered catalog (6 layers)3Orchestrator (per-check budget)4Reporting
OutputsLive dashboardExcel reportHTML report

Typical layout

SDTM CompletenessSIDEBARRun checksResultsReportsCatalogRunResultsCatalogKPI cardsLive per-domain progressIssues grouped by layer

By the numbers

87
Checks in catalog
47
Test functions
6
Check layers
0.4.0
Tool version

Data flow

Confirming that SDTM is complete and traceable back to raw is slow manual work across many domains, and a single badly written check could run for hours on a real study and abandon itself before producing anything.

Input: raw datasets + SDTM datasets + master mapping spec
        |
        v
  Loaders + Join Engine (core/)      align raw to SDTM by mapping spec
        |
        v
  Check Catalog (config/check_catalog.py)
        |   Layer 1  Traceability    trc_001..008 (record/subject parity, SUPP, coverage)
        |   Layer 2  Domain rules    ae/lb/dm/ds/sv/vs/cm/ex/ie/mh + basic_*
        |   Layer 6  Value fidelity  vfd_001..009 (assign passthrough, date xform, parity)
        v
  Orchestrator (per-check time budget) --> live progress to dashboard
        |
        v
  Reporting (reporting/)  -->  Excel reporter + HTML dashboard

Engineering trade-offs

Layered check catalog (traceability / domain / value fidelity)
Groups checks by what they prove about the data, so a reviewer reads results as a completeness story rather than a flat list.
Set-membership rewrite of the SUPP completeness check
The original orphan-SUPP resolution was O(n_supp x n_parent) and ran for hours; pre-computing parent key sets makes it a fast membership test.
Per-check time budget with status reporting
One pathological check can no longer stall the whole run; it is bounded and reported instead of abandoning silently.
Engine bundled inside the app, overridable by env var
Works out of the box for a reviewer, but a developer can point SDTM_COMPLETENESS_ENGINE at a different engine build.

At a glance

A quick visual read of the countable facts; full detail in the table.

Checks in catalog87
Test functions47
Check layers6

Relative scale · values labelled · unit: count

Processing characteristics

MetricValueNotes
Checks in catalog87v12 brought the in-memory pipeline to 85 all-ok, plus L5_COV_007/008
Check layers6Traceability, domain, value fidelity among them
Test functions47Includes 17 v11-fix regression assertions
Tool version0.4.0App labelled v12
Reference run4730 issues / 633sReal GADI run cited in the v12 changelog
ReportsExcel + HTMLDashboard and downloadable report

Functional wins

01Validates SDTM completeness and traceability back to raw across six check layers in one run.
02Eliminated an O(n-squared) SUPP completeness check that previously ran for hours, by rewriting it as a set-membership test.
03Bounds every check with a per-check time budget so a single slow check is reported rather than stalling the whole run.
04Added site-change (L5_COV_007) and outlier (L5_COV_008) coverage checks, with a live progress dashboard and Excel/HTML reports.

Module dependencies

core
  • Python 3.9+
ui
  • streamlit
  • altair
data
  • pandas
  • numpy
  • openpyxl
  • xlsxwriter
  • pyreadstat
testing
  • pytest