Live Public Project

Water System Risk & Funding Priority Index

An explainable public-data screening system for prioritizing Ohio public drinking water systems for review using compliance history, enforcement history, vulnerability, drought exposure, funding context, small-system factors, and data quality.

Evidence

Screenshots from the live dashboard and the scored Ohio prototype.

Water System Risk dashboard with Ohio map, filters, and review tier charts
Live dashboard with map exploration, filters, summary metrics, review tiers, and county analysis.
Risk tier count chart for Ohio public water system records
Review tier distribution from the scored Ohio prototype.
Top counties by high-review water system records
County-level concentration of high-review records.
Spatial confidence count chart for water system geography
Spatial confidence makes data quality visible instead of hiding geography uncertainty.

Sample findings

Screening observations taken directly from the scored Ohio prototype. These are signals from public data, not regulatory findings about any individual system.

  • County concentration. Columbiana County had the largest number of high-review records (13), followed by Mahoning and Summit counties (11 each). High-review records spread across many counties rather than concentrating in a single metro area.
  • Size pattern in the highest tiers. Every one of the 188 High Review records was a small, very small, or medium system; no large system reached the High Review tier. Small systems showed the highest high-review rate (2.5%, versus 1.0% for very small and 0% for large), consistent with the focus on smaller systems that often have less staff and grant capacity.
  • Geometry source is tracked, not hidden. Only 1,077 of 16,339 records (6.6%) have an EPA service-area polygon (207 system-sourced, 870 modeled); the rest fall back to county centroids or are unmatched and are labeled "Approximate Location." Each record carries an explicit geometry-source tier so modeled or centroid placement is never overinterpreted as a verified service-area boundary.

Problem

Public drinking water data is spread across regulatory downloads, geographic boundaries, vulnerability datasets, drought feeds, and funding records. The signals are useful, but they are not naturally organized into a single review-priority workflow.

The project needed a transparent way to identify which small systems may deserve earlier review for compliance support, technical assistance, infrastructure funding research, or resilience planning while clearly stating what the model can and cannot claim.

Users or audience

The primary audience is recruiters, hiring managers, and technical interviewers evaluating practical data engineering, public-data analytics, explainable scoring, API design, and dashboard delivery.

The dashboard is a portfolio screening model. It is not a regulatory determination, legal finding, engineering siting tool, official risk assessment, or statement that any water system is unsafe.

Solution

The system builds a scored Ohio water-system dataset from public sources, validates the outputs, loads the dashboard data into Postgres, and serves filtered summaries, maps, system lists, and detail views through a FastAPI backend.

The live frontend keeps filtering, sorting, and pagination server-side so the browser only requests the records and aggregates needed for the current view.

Architecture

Public source dataEPA SDWA/ECHO, service-area geography, CDC/ATSDR SVI, Census boundaries, U.S. Drought Monitor, and staged SRF records
Python pipelineDownload inventory, cleaning, feature engineering, scoring, validation, and export steps
Analytical outputsCSV and Parquet tables for systems, compliance, enforcement, funding, geography, risk scores, and data quality
Postgres and FastAPIServer-side filtering, sorting, pagination, metadata, summaries, map points, and system details
Static dashboardCloudflare Pages frontend with API-backed metrics, map, charts, ranked table, and limitations copy

Data flow

Raw source files are cleaned into consistent water-system, compliance, enforcement, geography, funding, and drought/vulnerability features. The scoring step normalizes each component to a 0-100 scale, applies documented weights, assigns review tiers, and writes validation-ready outputs.

The web export creates an application dataset, the API loader seeds Postgres, and the frontend queries focused endpoints for metadata, summary cards, tier counts, ranked systems, single-system detail, and lightweight map points.

Tools used

  • Python
  • Pandas
  • FastAPI
  • Postgres
  • Cloudflare Pages
  • Leaflet
  • GIS data
  • Power BI-ready exports
  • pytest

Key features

  • Scored 16,339 Ohio public water system records.
  • Real EPA service-area boundary polygons (1,077 systems) on the map, with a geometry-source hierarchy from system-sourced and modeled boundaries down to county-centroid approximations.
  • Transparent weighting for compliance, enforcement, vulnerability, drought, funding gap, small-system context, and data quality.
  • Server-side search, filters, sorting, pagination, map points, map boundaries, and detail endpoints.
  • Map layer controls and a per-system geography-evidence panel (boundary type, provider, PWSID match, area, confidence, limitations).
  • Validation report expanded from 13 to 19 checks (adding geometry-source, boundary dissolve, count reconciliation, and simplification-quality checks) passing before publication.

Tradeoffs and constraints

The model favors transparency and explainability over black-box prediction. That makes the reasoning easier to inspect, but the weights remain analytical assumptions that would need subject-matter review before operational use.

Many systems can be scored from SDWA records, but not every system has high-confidence service-area geometry. The dashboard surfaces spatial confidence so users can distinguish map convenience from verified service-area precision.

Methodology

Appropriate use: portfolio demonstration of public-data engineering, transparent scoring, API-backed dashboard delivery, and limitation-aware analytics.

Inappropriate use: regulatory decisions, legal findings, engineering siting, official safety conclusions, or claims that a specific system is unsafe.

Results or expected value

The finished platform demonstrates the full path from public source data to validated analytical outputs, API delivery, and an interactive dashboard that states its limits clearly.

16,339Ohio public water system records scored.
19/19Validation checks passed (expanded from 13 to cover geometry and boundary quality).
LiveDashboard, API, map, filters, and ranked table available for review.

Limitations

Source data is not real-time. County-level vulnerability and drought context are fallback indicators, not household-level exposure measures. County centroid mapping is suitable for screening only where service-area geometry is unavailable.

Unmatched funding records do not prove a system received no funding, and a high review-priority score is a screening signal rather than a finding about water safety.

What I would improve next

I would add deeper SRF project matching, tract-level SVI where geometry supports it, PostGIS-backed spatial processing for broader scaling, automated refresh checks, and clearer model evaluation once a validated outcome is defined.