Haoyu Yang , Shuaiyu Zhao , Zihao Wang , Jiejia Wang , Lei Zou , Qingsheng Wang
{"title":"Fine-tuned large language models for natech analytics: Evidence from two decades of Texas chemical emission incidents","authors":"Haoyu Yang , Shuaiyu Zhao , Zihao Wang , Jiejia Wang , Lei Zou , Qingsheng Wang","doi":"10.1016/j.psep.2025.107973","DOIUrl":null,"url":null,"abstract":"<div><div>Natural-hazard-triggered technological accidents (Natechs) pose compound risks to the process industries, yet large historical databases remain under-utilized due to unstructured narratives and keyword-based screening. In this work, we develop an automated, data-driven framework that fine-tunes generative large language models (LLMs) to jointly (i) classify Natech status and the primary hazard, (ii) extract affected unit–issue pairs, and (iii) generate brief, evidence-style justifications from incident text. Using the Texas Commission on Environmental Quality (TCEQ) air emission event database (2004–2024) as a region-specific testbed, we construct a supervised fine-tuning corpus via a schema-constrained template and evaluate the fine-tuned LLMs against LSTM and BERT baselines. The best fine-tuned model leads on every metrics, with an overall accuracy of 0.958 and macro-F1 of 0.930, while a compact 3B variant remains competitive, demonstrating the superior performance and data efficiency of pretrained transformers under constrained supervision. Applied at scale, the framework quantifies climate-related patterns in Texas. By frequency, Natech incidents form ∼6 % of statewide records, with counts surging during extreme years (hurricanes in 2005, 2008 and 2017; winter freeze in 2021). By excessive emissions, Natech contributions ∼10 % statewide and ∼14 % in coastal Texas; along the coast, hurricanes dominate and yield a disproportionately large share of Natech releases. The framework delivers single-pass, structured analytics that reduce manual effort and improve reproducibility, providing decision-ready evidence for emergency preparedness and mitigation. Looking ahead, coupling the model with retrieval-grounded weather data and human-in-the-loop audits could enable a production-grade Natech analytics agent for continuous monitoring and planning.</div></div>","PeriodicalId":20743,"journal":{"name":"Process Safety and Environmental Protection","volume":"203 ","pages":"Article 107973"},"PeriodicalIF":7.8000,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Process Safety and Environmental Protection","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957582025012406","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Natural-hazard-triggered technological accidents (Natechs) pose compound risks to the process industries, yet large historical databases remain under-utilized due to unstructured narratives and keyword-based screening. In this work, we develop an automated, data-driven framework that fine-tunes generative large language models (LLMs) to jointly (i) classify Natech status and the primary hazard, (ii) extract affected unit–issue pairs, and (iii) generate brief, evidence-style justifications from incident text. Using the Texas Commission on Environmental Quality (TCEQ) air emission event database (2004–2024) as a region-specific testbed, we construct a supervised fine-tuning corpus via a schema-constrained template and evaluate the fine-tuned LLMs against LSTM and BERT baselines. The best fine-tuned model leads on every metrics, with an overall accuracy of 0.958 and macro-F1 of 0.930, while a compact 3B variant remains competitive, demonstrating the superior performance and data efficiency of pretrained transformers under constrained supervision. Applied at scale, the framework quantifies climate-related patterns in Texas. By frequency, Natech incidents form ∼6 % of statewide records, with counts surging during extreme years (hurricanes in 2005, 2008 and 2017; winter freeze in 2021). By excessive emissions, Natech contributions ∼10 % statewide and ∼14 % in coastal Texas; along the coast, hurricanes dominate and yield a disproportionately large share of Natech releases. The framework delivers single-pass, structured analytics that reduce manual effort and improve reproducibility, providing decision-ready evidence for emergency preparedness and mitigation. Looking ahead, coupling the model with retrieval-grounded weather data and human-in-the-loop audits could enable a production-grade Natech analytics agent for continuous monitoring and planning.
期刊介绍:
The Process Safety and Environmental Protection (PSEP) journal is a leading international publication that focuses on the publication of high-quality, original research papers in the field of engineering, specifically those related to the safety of industrial processes and environmental protection. The journal encourages submissions that present new developments in safety and environmental aspects, particularly those that show how research findings can be applied in process engineering design and practice.
PSEP is particularly interested in research that brings fresh perspectives to established engineering principles, identifies unsolved problems, or suggests directions for future research. The journal also values contributions that push the boundaries of traditional engineering and welcomes multidisciplinary papers.
PSEP's articles are abstracted and indexed by a range of databases and services, which helps to ensure that the journal's research is accessible and recognized in the academic and professional communities. These databases include ANTE, Chemical Abstracts, Chemical Hazards in Industry, Current Contents, Elsevier Engineering Information database, Pascal Francis, Web of Science, Scopus, Engineering Information Database EnCompass LIT (Elsevier), and INSPEC. This wide coverage facilitates the dissemination of the journal's content to a global audience interested in process safety and environmental engineering.