Live Public Project

Black Box AI

Natural-language aviation safety analytics over NTSB final reports, built around guarded SQL for exact statistics, cited retrieval for narrative questions, validated charts, BYOK model controls, and visible audit trails.

Evidence

Screenshots from the live app showing the two major answer paths: SQL-backed analytics and cited narrative retrieval.

Black Box AI SQL-backed chart answer for fatal accidents by phase of flight
SQL-backed analytical answer with a validated chart and result table.
Black Box AI narrative retrieval answer with cited NTSB records
Narrative retrieval path with cited accident reports and probable-cause text.
Black Box AI SQL and audit trail panels
Visible SQL, route decision, validation step, execution step, chart validation, and limitations.
Black Box AI model key panel with provider, model, and session key controls
BYOK model panel for provider selection, session-scoped key handling, and clearing keys.

Problem

Aviation safety questions split into two very different types. Counting questions need exact structured data, while causal or explanatory questions need retrieval over accident narratives.

A single chat-style interface can hide that distinction and make answers feel more certain than they are. The project needed to route questions to the right engine, show the evidence, and keep generated prose from inventing numbers.

Users or audience

The primary audience is recruiters, hiring managers, and technical interviewers evaluating practical AI-assisted analytics, retrieval design, SQL safety, data product thinking, and transparent user-facing evidence.

The app is a portfolio analytics system over a public NTSB final-report dataset. It is not an official NTSB tool, aviation investigation system, or safety determination source.

Solution

Black Box AI routes plain-English questions to SQL, retrieval, charting, or a combined path. Structured questions use deterministic plans or guarded model-planned SQL, then return the SQL, table, chart, and audit trail. Narrative questions retrieve cited accident reports and show the source records directly.

The live app includes FastAPI, React, provider adapters, in-memory BYOK handling, query planning, SQL validation, chart validation, retrieval, answer composition, limitations, and an audit trail for each answer.

Architecture

NTSB final-report corpus7,462 US aviation accident final reports from 2016-2023, structured fields plus narrative text
Ingestion and indexesCSV parsing, structured accident table, per-accident narratives, dense embeddings, BM25 index, and hybrid retrieval experiments
Question routingClassifies requests as SQL, retrieval, chart, or combined work
Guarded analytics layerDeterministic SQL plans, Postgres SELECT-only guard, read-only execution, row limits, and validated Vega-Lite chart specs
React and FastAPI appPlain-English question UI, result table, chart, citations, source SQL, limitations, BYOK panel, and audit trail

Data flow

The ingestion pipeline parses the NTSB source data into structured accident fields and narrative records. Structured questions run through approved SQL plans or a guarded SQL generation path. Narrative questions query retrieval indexes and return cited accident records.

Every answer returns supporting artifacts: route decision, confidence, SQL when used, result table, chart spec when used, citations when used, limitations, and audit events describing how the answer was produced.

Tools used

  • Python
  • FastAPI
  • React
  • Postgres
  • SQLite
  • BM25
  • Embeddings
  • Vega-Lite
  • pytest

Key features

  • Question router that separates exact statistics from narrative retrieval.
  • SQL-backed answers where numbers come from executed queries, not model text.
  • SELECT-only SQL guard, approved schema catalog, row limits, and read-only execution pattern.
  • Validated chart specs tied to the real result table.
  • Cited retrieval results with official NTSB record links and probable-cause text.
  • BYOK provider panel with session-scoped key handling and error redaction utilities.
  • Audit trail and limitations shown with each answer.

Evaluation

The retrieval layer was evaluated with 17 hand-labeled questions across semantic, BM25 keyword, and reciprocal-rank-fusion hybrid retrieval. The result was useful precisely because it challenged the starting assumption: hybrid retrieval did not automatically win.

BM25, semantic search, and hybrid all reached Hit@5 of 1.000, while MRR and Recall@5 were close. The project therefore treats hybrid retrieval as insurance for some query types, not as a universal upgrade.

Methodology

Appropriate use: portfolio demonstration of AI-assisted analytics, cited retrieval, SQL safety, chart validation, data-product design, and evidence-first answer presentation.

Inappropriate use: official accident investigation, regulatory decision-making, legal conclusions, aircraft safety certification, or claims beyond the loaded NTSB final-report dataset.

Results or expected value

The finished platform shows how an AI analytics interface can stay accountable by separating exact counts from narrative evidence and exposing the machinery behind each answer.

7,462NTSB final reports represented in the project corpus.
SQLStructured answers show the query, result table, chart, and audit trail.
CitedNarrative answers link back to source accident records.

Limitations

The loaded dataset covers NTSB final reports from 2016-2023 and notes that 2020 and 2021 are absent in this source snapshot. Recent years may lag because final reports take time to publish.

Entity resolution is partial, the labeled retrieval evaluation set is small, and the router remains an engineered routing layer that should continue to be measured as new question types appear.

What I would improve next

I would expand the labeled evaluation sets, add deeper entity resolution for aircraft manufacturers, broaden deterministic SQL plans, add more chart templates, and continue hardening the Postgres deployment with query fingerprints and validation reporting.