LLM-Ready Permit Risk Detector

ECHO RBLC LLM

EPA ECHO 벌크 데이터, RBLC control evidence, state event records, and planned small-domain LLMs turn air-permit assumptions into an auditable pre-permit risk memo.

  • ECHO facility, CAA Pipeline, FE&C, Air Emissions analysis
  • Planned LLM evidence explanation, citations, and pre-permit checklist
  • QLoRA-ready instruction data for 3-4B domain adaptation
Analyze My Spec

Analysis Workflow

ECHO → RBLC → LLM

EPA ECHO

Facilities, violations, cases, emissions

RBLC Check

Controls, limits, applicability

LLM Reasoning

Explain, cite, summarize, checklist

Permit Risk

0.9115 Critical
1 Ingest ECHO data
2 Check RBLC
3 Score risk
4 Memo draft
Target facilities 2,699

ECHO petrochemical/EG candidates

Future EA ROC-AUC 0.9108

Temporal forward validation

Pipeline text F1 0.9228

Grouped by REGISTRY_ID split

Scenario templates 12

Petrochemical permit screens

What The Scores Mean

These are validation metrics for the research pipeline, not live permit approval probabilities.

Future EA ROC-AUC 0.9108

EA means Enforcement Action. ROC-AUC measures how well the model ranks future enforcement-action cases above non-action cases when validation is done forward in time.

Pipeline text F1 0.9228

F1 balances precision and recall for extracted text labels. The split is grouped by REGISTRY_ID so records from the same facility do not leak between train and test sets.

Project Spec Input

Local rule screen
Example templates Click inserts project text
Planned search ECHO / RBLC / state events
LLM layer Off: no local model call
Graph RAG Planned: Worker / Vectorize / D1

Scenario Output

0.9115 Critical
Project
EG-adjacent petrochemical unit
Location
TX
Scenario
Flare / MSS / upset exposure
EG flare/MSS Refinery sulfur Resin VOC/HAP Hydrogen SMR DMC solvent Methanol-to-olefins Blue ammonia SAF hydrotreater Battery solvent CO2 capture Chlor-alkali Backup generators

Risk memo draft

Critical enforcement-action risk is driven by Texas petrochemical NAICS exposure, Title V/MACT/PSD/NSR program overlap, HAP/VOC pollutants, and flare/MSS terms. Check RBLC controls and TCEQ event narratives before permit assumptions are fixed.

Triggered evidence Value Recommended action
TX + target NAICS enforcement rate 0.6794 Review analog CAA cases
Program + pollutant enforcement rate 0.7855 Confirm MACT/PSD/NSR matrix
State event narrative gap TCEQ Search flare and MSS events

Regulation Map

US air permit focus
Title V

Major-source operating permit

Consolidates federally enforceable air requirements, monitoring, reporting, and deviation certification for major stationary sources.

PSD / NSR

Preconstruction review

Screens new projects and modifications before construction. Major projects can require BACT or LAER, modeling, offsets, and public notice.

MACT / NESHAP

Hazardous air pollutant controls

Applies source-category standards for HAP emissions, including process controls, LDAR, monitoring, recordkeeping, and work-practice limits.

NSPS

New source performance standards

Sets technology-based requirements for listed new, modified, or reconstructed source categories such as engines, heaters, tanks, and process units.

RBLC

Control-technology precedent

EPA clearinghouse for BACT, RACT, and LAER determinations. It helps compare controls and limits used in similar permits.

ECHO / CAA EA

Compliance and enforcement history

ECHO aggregates facilities, violations, inspections, emissions, and Clean Air Act enforcement actions used for analog risk screening.

GHGRP / MRV

Greenhouse-gas reporting layer

Tracks CO2e reporting and measurement, reporting, and verification assumptions. It should be handled separately from CAA criteria-pollutant risk.

State Events

Local permit and upset narratives

TCEQ, LDEQ, and other state records can reveal startup, shutdown, maintenance, flaring, and permit-revision patterns that federal data misses.

Paper Figures

Open Word Manuscript
LLM permit risk detector architecture diagram
Data-to-LLM architecture
Validation performance bar chart
Validation performance
Example risk-screen output
Example output
Dataset API LLM roadmap
Expansion roadmap

Data & API Expansion

The first model is ECHO-centered. The next research layer connects RBLC controls, EIASS bilingual alignment, GHGRP/K-ETS reporting, and permit revision sequences.

Source Domain LLM task Status
EPA ECHO Facilities, violations, emissions Risk explanation Core
RBLC Controls, limits, permits Control recommendation Next
GHGRP / K-ETS GHG reporting MRV consistency Planned
TCEQ / LDEQ Events and permit documents Revision sequence summary Planned