Ministry of Statistics (India)
The goal wasn’t a flashy chatbot — it was an evidence-first system for dense policy and statistical documentation, where every answer must be provable.
Why this problem is hard.
Long PDFs, circulars, annexures, tables, scanned pages, and frequent cross-references.
Answers often require multi-hop reasoning: definitions → exceptions → eligibility → reporting obligations.
Outputs must be defensible: citations, traceability, and repeatability across versions.
What we built.
The architecture focuses on four guarantees: preserve structure, extract deterministically, resolve entities consistently, and return answers that are auditable.
- Layout-aware parsing (sections, tables, references)
- Page-level citations and stable evidence IDs
- Extraction reporting for coverage and gaps
- Entity + relation extraction into controlled schemas
- Entity normalization and deduplication
- Cross-document linking for multi-hop traversal
- Graph-constrained retrieval for relationship questions
- Table-aware retrieval to avoid losing critical numbers
- Evaluation suite to measure groundedness and evidence quality
- Citation-first answers with trace paths
- Audit-ready logs of retrieval + generation
- Guardrails: confidence thresholds and gap reporting
What changed.
Analysts move from manual document hunting to targeted evidence packs and traceable answers.
Relationship queries become practical: dependencies, exceptions, and cross-references are captured in the graph.
The pipeline design can be replicated across other ministries, regulators, or compliance-heavy organizations.
We run evaluation-first engagements around the questions your teams need to answer, the evidence they rely on, and the controls required for production.