← All projects

A local Streamlit tool that runs a large library of built-in checks across SDTM/ADaM datasets and presents findings in a two-tab dashboard, each with an impact level, a plain-English explanation, a resolution hint, and reproducible SAS to drill into offending records.

What it adds

Pinnacle 21 checks conformance to the CDISC standard. CIS runs locally before a P21 run and adds what a programmer actually needs to act: one impact scale, a plain-English explanation per finding, and copyable SAS to reproduce it.

How it works

InputsSAS7BDAT datasetsXPT datasetsSDTM + ADaM
Process1Parser to common frame2Domain-modular check registry3Engine runs 260+ checks4Impact + aggregator (1-5)
OutputsIssue Summary tabDetail tab + SASRotating audit log

Typical layout

Clinical Integrity SuiteSIDEBARUpload datasetsRun ValidationFiltersIssue Su…DetailFindings grouped by impactSelected finding + resolution hintCopyable SAS to reproduce

By the numbers

260+
Built-in checks
109
Test functions
1-5
Impact scale
2
Dashboard tabs

Data flow

SDTM and ADaM datasets accumulate data quality issues that are expensive to find late. A full Pinnacle 21 run or sponsor review is slow feedback for a programmer mid-build.

Input: SAS7BDAT / XPT datasets (SDTM + ADaM)
        |
        v
  Parser (modules/parser.py)        normalises datasets into a common frame
        |
        v
  Check Registry (registry.py)      domain-modular checks: modules/checks/<domain>.py
        |                            (ae, lb, dm, cm, ds, eg, ex, vs, pk, onc ...)
        v
  Engine (engine.py)                runs each check, collects findings
        |
        v
  Impact + Aggregator               single 1-5 impact scale, dedupes, ranks
        |
        v
  Reporter (reporter.py)  -->  Two-tab Streamlit dashboard: Summary + Detail

Engineering trade-offs

Domain-modular check files (modules/checks/<domain>.py)
Each SDTM domain owns its checks, so a programmer can find and extend AE checks without reading the whole engine.
Single 1-5 impact scale (v33 change)
Replaced four separate severity taxonomies with one consistent scale so findings are directly comparable across domains.
Reproducible SAS emitted per finding
A programmer acts on a finding faster when they can paste the exact SAS to see the offending records themselves.
Centralised rotating log at logs/cis.log
5MB x 5 backups gives a durable audit trail without unbounded disk growth in a shared environment.

At a glance

A quick visual read of the countable facts; full detail in the table.

Built-in checks260
Test functions109
SDTM domains20

Relative scale · values labelled · unit: count

Processing characteristics

MetricValueNotes
Built-in checks260+Domain-modular; v34 vectorised AE018/19/21, MH003, DS010, EG008
Test functions109Counted across the test suite
Impact scale1-5Critical / High / Medium / Low / Info
Dashboard tabs2Issue Summary + Detail (reduced from eight)
Input formatsSAS7BDAT, XPTRead via pyreadstat
LoggingRotating 5MB x 5logs/cis.log

Functional wins

01Runs 260+ SDTM/ADaM checks locally before a Pinnacle 21 run or sponsor review, shortening the feedback loop for the programmer.
02Every finding carries an impact level, a plain-English explanation, a resolution hint, and copyable SAS to reproduce it.
03Domain-modular check organisation lets new checks be added in the relevant domain file without touching the engine.
04Vectorised hot-path checks to keep large-dataset runs responsive.

Module dependencies

core
  • Python 3.9+
  • pyyaml
ui
  • streamlit
  • fastapi
  • uvicorn
data
  • pandas
  • pyreadstat
  • openpyxl
  • xlsxwriter
testing
  • pytest