Clinical Trial Intelligence Platform (CTI)

← All projects

A platform that parses a SAP (and optional TFL shell and ADaM spec), extracts each output with its class, type, and population, runs QC on the result, and presents an analytics dashboard with copyable SAS pseudocode — using deterministic patterns rather than an LLM.

What it adds

No conformance tool reads a Statistical Analysis Plan. This extracts the programming specification from the SAP itself — the step before any dataset exists to be checked.

How it works

InputsSAP (DOCX / RTF / TXT)TFL shell (optional)ADaM spec (optional)

Process1Document parsers2Entity extraction3Classification: class / type / population4QC engine (readiness score)

OutputsStructured output specSAS pseudocodeExcel workbook

Typical layout

By the numbers

40-100+

Outputs per study

Mock SAP fixtures

LLMs used

Version

Screenshots

Add image

The upload screen with a primary SAP loaded and optional TFL shell and ADaM spec slots visible

Drop cti-platform-01-upload.png into
/public/screenshots/cti-platform/

Add image

The Outputs tab showing the extracted output list with class, type and population filters applied

Drop cti-platform-02-outputs.png into
/public/screenshots/cti-platform/

Add image

The Analytics tab showing output class/type/population/confidence charts and the dataset matrix

Drop cti-platform-03-analytics.png into
/public/screenshots/cti-platform/

Data flow

A Statistical Analysis Plan describes dozens to hundreds of outputs in prose. Turning that into a structured programming specification by hand is slow, and details about output class, type, and population are easy to misread.

Input: Primary SAP (DOCX/RTF/TXT)
        + optional TFL shell (DOCX)  + optional ADaM spec (XLSX)
        |
        v
  Parsers (src/parsers)            read documents into text + structure
        |
        v
  Entity Extraction               identify each output + attributes
        |
        v
  Classification (utils/classification.py)
        |                          output class / type / population
        v
  Rule Engine + Normalization     dataset inference, ADaM variable registry
        |
        v
  QC Engine                       readiness score, category + severity
        |
        v
  Streamlit dashboard  -->  outputs, analytics, SAS pseudocode, Excel export

Engineering trade-offs

Deterministic patterns instead of an LLM

Runs offline in a regulated environment with reproducible, inspectable output and no external model dependency.

Optional TFL shell and ADaM spec inputs

The shell adds outputs the SAP prose omits; the ADaM spec sharpens dataset inference and adds variable tables — both optional so the SAP alone still works.

Population matcher that excludes PK/biomarker from ITT default

A v5 fix stopped PK and biomarker outputs being wrongly defaulted to ITT, a meaningful spec error.

DOCX fixtures, including a 150-page SAP

Generated mock SAPs across phase 3, oncology TTE, crossover PK, safety extension and adaptive designs exercise the parser at realistic scale.

At a glance

A quick visual read of the countable facts; full detail in the table.

Typical outputs100

Mock SAP fixtures6

Output filters4

Relative scale · values labelled · unit: count

Processing characteristics

Metric	Value	Notes
Inputs	SAP + shell + ADaM spec	DOCX/RTF/TXT, DOCX, XLSX
Outputs per study	40-100+	Shell structure adds outputs beyond the SAP prose
Extraction method	Pattern-based	No LLM, no internet required
QC	Readiness score	Filterable by category and severity
Test fixtures	5 mock SAPs + 150-page	Phase 3, oncology TTE, crossover PK, safety, adaptive
Output	SAS pseudocode + Excel	Copyable st.code SAS blocks; Excel workbook

Functional wins

01Turns SAP prose into a structured output specification with class, type and population per output.

02Runs fully offline with deterministic patterns, so results are reproducible and inspectable with no LLM dependency.

03Combines SAP, TFL shell and ADaM spec to infer datasets and surface an ADaM variable registry.

04Scores the extracted spec for readiness and emits copyable SAS pseudocode plus an Excel workbook.

Module dependencies

core

Python

streamlit

data

pandas
python-docx
openpyxl

testing

pytest