A toolkit that ingests the protocol, SAP, SDTM data and spec, and TLF shells, then builds an interactive knowledge graph linking each ADaM variable to its source, rule, and SAP evidence — retrievable with BM25 search, no LLM or internet required.
Conformance tools check outputs; they do not trace lineage. This links every ADaM variable to its SDTM source, derivation rule and SAP evidence — the traceability a checker assumes already exists.
How it works
Typical layout
By the numbers
Screenshots
graphrag-adam-01-build.png into/public/screenshots/graphrag-adam/
graphrag-adam-02-lineage.png into/public/screenshots/graphrag-adam/
graphrag-adam-03-query.png into/public/screenshots/graphrag-adam/
Data flow
When programming ADaM, the link between an ADaM variable, its SDTM source, the derivation rule, and the SAP text that justifies it is scattered across documents. Tracing one variable to its evidence is slow manual cross-referencing.
Input: Protocol PDF + SAP DOCX + SDTM (XPT/SAS7BDAT)
+ optional Excel spec + TLF shells
|
v
Ingestion (ingestion/) protocol, sap, sdtm_data, sdtm_spec, shell
|
v
Extraction (extraction/) chunking -> NER -> patterns -> relations
|
v
Knowledge Graph (graph/builder.py, kg.py)
| nodes: ADaM var, SDTM source, rule, evidence
v
Retrieval (graph/search.py, rag.py) BM25 over chunks, Top-K
|
v
Visualisation (viz/pyvis_viz.py) + Streamlit app Engineering trade-offs
At a glance
A quick visual read of the countable facts; full detail in the table.
Relative scale · values labelled · unit: count
Processing characteristics
| Metric | Value | Notes |
|---|---|---|
| Inputs | Protocol, SAP, SDTM, spec, shells | PDF, DOCX, XPT/SAS7BDAT, XLSX |
| Retrieval | BM25 (rank-bm25) | Top-K over document chunks |
| Graph | NetworkX + pyvis | ADaM variable to source/rule/evidence |
| Default chunk / Top-K | 1200 / 8 | User-adjustable in the sidebar |
| Build time | 10-60s typical | Stated in HOW_TO_RUN |
| LLM | None | No model, no internet required |
Functional wins
Module dependencies
- Python 3.9+
- Jinja2
- chardet
- streamlit
- pyvis
- pandas
- numpy
- pdfplumber
- PyPDF2
- python-docx
- openpyxl
- pyreadstat
- rank-bm25
- networkx
- scipy