Statistical drift detection, column-level lineage, and causal discovery — for dbt, warehouses, and data lakes. A Python library, CLI, and Web app — all MIT licensed. Not just the Python library, like the others.
Built forClickHouseandBigQueryfirst.·Postgres · Snowflake · others — WIPcontributors welcome ↗
New · LLM Wiki semantic layer
Your Trello board is already a semantic layer. dqt extracts it.
Dump tickets, SQL, and BI reports into raw/. Point Claude Code at the vault — it synthesises dataset descriptions, metric definitions, and causal edges into wiki/. No manual YAML authoring.
Based on Karpathy's LLM Wiki pattern ↗30+
detector algorithms
30+
declarative checks
9+
warehouse engines
100B+
rows validated (and counting)
MIT
no vendor lock-in
The hour after the alert
You set a threshold. It fires. Slack lights up. Now you're bouncing between dbt docs, the warehouse, and your BI tool — trying to figure out which upstream model changed, whether the spike in nulls explains the dashboard regression, and whether this is worth waking the on-call engineer for.
dqt was built for the part that comes after the alert. It reads your dbt manifest, parses your warehouse SQL into a column-level lineage graph, runs 30+ statistical detectors, and discovers causal relationships across your metrics — so the next time something moves, you already know what moved it.
Without dqt
Now what? Go dig through git log, dbt docs, warehouse history…
With dqt
Causal trace: stg_payments → orders → revenue. Upstream model stg_payments introduced a schema break 6h ago. E-value = 3.2.
Four layers. One library.
Statistical detectors
MAD, double-MAD, isolation forest, KS, STL residual z-scores, adjusted boxplot fences. Plus completeness, validity, freshness, schema-change, and SQL-assertion checks. Every detector returns the same (verdict, score, plain_english) shape.
mad_outlier_fraction · ks_pvalue · stl_residual_zscore · isolation_forest_fraction
Column-level lineage
dqt walks your dbt manifest and warehouse DDL with sqlglot to build a column-level dependency graph. From any incident, get an automatic blast radius — every downstream table and metric, ranked by exposure.
LLM Wiki · Semantic layer
dqt uses Karpathy's LLM Wiki pattern. Dump your Trello tickets, SQL files, and BI reports into raw/. Point Claude Code at the vault. It synthesises wiki/ — dataset descriptions, metric definitions, causal edges — from the artifacts your team already has. YAML contracts compatible with dbt's semantic_models.yml.
raw/tickets/ · raw/sql/ · raw/reports/ → wiki/metrics/ · wiki/lineage/
Causal discovery
dqt runs causal discovery across your metric time series, prunes edges with stability selection, and proposes directed metric→metric relationships annotated with lag, confidence, and E-values. Every edge reviewed by a human before entering the production DAG.
The only DQ tool that ships causal discovery.
Every BI request your GTM team filed is a semantic definition waiting to be extracted. The ticket says what the metric means. The SQL says how it's computed. The report says what thresholds matter.
dqt uses Karpathy's LLM Wiki structure: raw/ for atomic source documents, wiki/ for synthesised knowledge. Point Claude Code at the vault and it writes the semantic layer for you — from the artifacts your team already has.
Read the full workflow guide →Export Trello tickets + attachments
SQL files, report HTMLs, metric definitions
Put them in raw/
raw/tickets/ · raw/sql/ · raw/reports/ · raw/schema/
Point Claude Code at the vault
cd vault && claude .
Claude Code synthesises wiki/
datasets, metrics, lineage, causal edges — grounded in your actual data
dqt generates per-column docs + checks
write_vault() · dqt run checks.yaml
Recommended workflow
Three plugins turn Claude Code into a grounded data-quality engineer that knows your warehouse, knows the dqt API, and can run checks from natural language.
Connects Claude Code to dqt's live documentation and source — all 38 detector slugs, the exact YAML schema, and adapter protocol. No training-data lag.
Gives Claude Code structured workflows for planning, executing, and reviewing multi-step tasks — essential for building out a full dqt check suite or semantic layer from scratch.
Each warehouse publishes its own MCP — e.g. mcp-clickhouse for ClickHouse, or the Postgres MCP server. Connect Claude Code to your warehouse and it can inspect live schemas, sample real distributions, and write dqt checks grounded in your actual data.
you
Look at my orders table and write dqt checks for amount_usd. Flag outliers and distribution shift.
claude code
Querying your warehouse via MCP... orders.amount_usd: right-skewed, p99=$4,820. Context7 confirms: use adjusted_boxplot_fraction (handles skew via medcouple) + ks_pvalue for drift. checks: - table_name: orders column_name: amount_usd detector_slug: adjusted_boxplot_fraction - table_name: orders column_name: amount_usd detector_slug: ks_pvalue params: {alpha: 0.01}
Statistical & ML algorithms · 32
Declarative checks · 28
Three lines to your first check.
from dqt import Check, Runner, MemoryStore
check = Check(
schema_name="public",
table_name="orders",
column_name="amount",
detector_slug="mad_outlier_fraction",
)
result = Runner(MemoryStore()).run(check, adapter)
print(result.plain_english)
# → "0.82% of values are outliers — within the 1% warn threshold"No server required. The optional FastAPI service and dashboard are there when you want them — and stay out of the way when you don't.
Where dqt sits.
Causal discovery isn't a nice-to-have — it's the difference between “orders are down” and “orders are down because the EU marketing-spend job missed its 06:00 run.”
| Capability | dqt | Great Expectations | Soda | Elementary | Dataplex |
|---|---|---|---|---|---|
| Open source (MIT) | ✓ | ✓ | partial | ✓ | — |
| 30+ statistical detectors | ✓ | ~ | limited | ~ | ✓ |
| Column-level lineage | ✓ | — | — | partial | ✓ |
| Causal discovery | ✓ | — | — | — | — |
| AI-grounded incident explainer | ✓ | — | — | partial | ✓ |
| pip install, runs offline | ✓ | ✓ | partial | partial | — |
| No vendor lock-in | ✓ | ✓ | partial | partial | — |
Drop it in next to the tools you already use.
Open source · MIT licensed · Python 3.12+ · No telemetry · No signup · No credit card
About the author
Anton Barr is a data geek, getting things done since 1972 and vibe-coding at unreasonable hours. A student of 質 (shitsu): quality, substance, the inner nature of a thing. dqt is a personal project — the data quality tool he kept wishing existed.