用于技术分析的微调大型语言模型:来自德克萨斯州二十年化学排放事件的证据

IF 7.8 2区 环境科学与生态学 Q1 ENGINEERING, CHEMICAL
Haoyu Yang , Shuaiyu Zhao , Zihao Wang , Jiejia Wang , Lei Zou , Qingsheng Wang
{"title":"用于技术分析的微调大型语言模型:来自德克萨斯州二十年化学排放事件的证据","authors":"Haoyu Yang ,&nbsp;Shuaiyu Zhao ,&nbsp;Zihao Wang ,&nbsp;Jiejia Wang ,&nbsp;Lei Zou ,&nbsp;Qingsheng Wang","doi":"10.1016/j.psep.2025.107973","DOIUrl":null,"url":null,"abstract":"<div><div>Natural-hazard-triggered technological accidents (Natechs) pose compound risks to the process industries, yet large historical databases remain under-utilized due to unstructured narratives and keyword-based screening. In this work, we develop an automated, data-driven framework that fine-tunes generative large language models (LLMs) to jointly (i) classify Natech status and the primary hazard, (ii) extract affected unit–issue pairs, and (iii) generate brief, evidence-style justifications from incident text. Using the Texas Commission on Environmental Quality (TCEQ) air emission event database (2004–2024) as a region-specific testbed, we construct a supervised fine-tuning corpus via a schema-constrained template and evaluate the fine-tuned LLMs against LSTM and BERT baselines. The best fine-tuned model leads on every metrics, with an overall accuracy of 0.958 and macro-F1 of 0.930, while a compact 3B variant remains competitive, demonstrating the superior performance and data efficiency of pretrained transformers under constrained supervision. Applied at scale, the framework quantifies climate-related patterns in Texas. By frequency, Natech incidents form ∼6 % of statewide records, with counts surging during extreme years (hurricanes in 2005, 2008 and 2017; winter freeze in 2021). By excessive emissions, Natech contributions ∼10 % statewide and ∼14 % in coastal Texas; along the coast, hurricanes dominate and yield a disproportionately large share of Natech releases. The framework delivers single-pass, structured analytics that reduce manual effort and improve reproducibility, providing decision-ready evidence for emergency preparedness and mitigation. Looking ahead, coupling the model with retrieval-grounded weather data and human-in-the-loop audits could enable a production-grade Natech analytics agent for continuous monitoring and planning.</div></div>","PeriodicalId":20743,"journal":{"name":"Process Safety and Environmental Protection","volume":"203 ","pages":"Article 107973"},"PeriodicalIF":7.8000,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fine-tuned large language models for natech analytics: Evidence from two decades of Texas chemical emission incidents\",\"authors\":\"Haoyu Yang ,&nbsp;Shuaiyu Zhao ,&nbsp;Zihao Wang ,&nbsp;Jiejia Wang ,&nbsp;Lei Zou ,&nbsp;Qingsheng Wang\",\"doi\":\"10.1016/j.psep.2025.107973\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Natural-hazard-triggered technological accidents (Natechs) pose compound risks to the process industries, yet large historical databases remain under-utilized due to unstructured narratives and keyword-based screening. In this work, we develop an automated, data-driven framework that fine-tunes generative large language models (LLMs) to jointly (i) classify Natech status and the primary hazard, (ii) extract affected unit–issue pairs, and (iii) generate brief, evidence-style justifications from incident text. Using the Texas Commission on Environmental Quality (TCEQ) air emission event database (2004–2024) as a region-specific testbed, we construct a supervised fine-tuning corpus via a schema-constrained template and evaluate the fine-tuned LLMs against LSTM and BERT baselines. The best fine-tuned model leads on every metrics, with an overall accuracy of 0.958 and macro-F1 of 0.930, while a compact 3B variant remains competitive, demonstrating the superior performance and data efficiency of pretrained transformers under constrained supervision. Applied at scale, the framework quantifies climate-related patterns in Texas. By frequency, Natech incidents form ∼6 % of statewide records, with counts surging during extreme years (hurricanes in 2005, 2008 and 2017; winter freeze in 2021). By excessive emissions, Natech contributions ∼10 % statewide and ∼14 % in coastal Texas; along the coast, hurricanes dominate and yield a disproportionately large share of Natech releases. The framework delivers single-pass, structured analytics that reduce manual effort and improve reproducibility, providing decision-ready evidence for emergency preparedness and mitigation. Looking ahead, coupling the model with retrieval-grounded weather data and human-in-the-loop audits could enable a production-grade Natech analytics agent for continuous monitoring and planning.</div></div>\",\"PeriodicalId\":20743,\"journal\":{\"name\":\"Process Safety and Environmental Protection\",\"volume\":\"203 \",\"pages\":\"Article 107973\"},\"PeriodicalIF\":7.8000,\"publicationDate\":\"2025-10-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Process Safety and Environmental Protection\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957582025012406\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, CHEMICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Process Safety and Environmental Protection","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957582025012406","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}
引用次数: 0

摘要

自然灾害引发的技术事故(Natechs)给过程工业带来了复合风险,但由于非结构化的叙述和基于关键字的筛选,大型历史数据库仍未得到充分利用。在这项工作中,我们开发了一个自动化的、数据驱动的框架,该框架对生成式大型语言模型(llm)进行微调,以联合(i)对Natech状态和主要危害进行分类,(ii)提取受影响的单元问题对,以及(iii)从事件文本中生成简短的、证据式的理由。利用德克萨斯州环境质量委员会(TCEQ)空气排放事件数据库(2004-2024)作为区域特定的测试平台,我们通过模式约束模板构建了一个监督微调语料库,并根据LSTM和BERT基线评估了微调后的llm。最佳的微调模型在每个指标上都领先,总体精度为0.958,宏观f1为0.930,而紧凑的3B变种仍然具有竞争力,证明了在约束监督下预训练变压器的优越性能和数据效率。在大规模应用中,该框架量化了德克萨斯州与气候相关的模式。按频率计算,Natech事件占全州记录的约6% %,在极端年份(2005年、2008年和2017年的飓风;2021年的冬季冻结),数量激增。由于过度排放,Natech在全州贡献了~ 10 %,在德克萨斯州沿海贡献了~ 14 %;在沿海地区,飓风占主导地位,产生了不成比例的大部分Natech泄漏。该框架提供了单次通过的结构化分析,减少了人工工作量,提高了再现性,为应急准备和缓解提供了决策就绪的证据。展望未来,将该模型与基于检索的天气数据和人在循环审计相结合,可以使Natech的生产级分析代理能够持续监测和规划。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Fine-tuned large language models for natech analytics: Evidence from two decades of Texas chemical emission incidents
Natural-hazard-triggered technological accidents (Natechs) pose compound risks to the process industries, yet large historical databases remain under-utilized due to unstructured narratives and keyword-based screening. In this work, we develop an automated, data-driven framework that fine-tunes generative large language models (LLMs) to jointly (i) classify Natech status and the primary hazard, (ii) extract affected unit–issue pairs, and (iii) generate brief, evidence-style justifications from incident text. Using the Texas Commission on Environmental Quality (TCEQ) air emission event database (2004–2024) as a region-specific testbed, we construct a supervised fine-tuning corpus via a schema-constrained template and evaluate the fine-tuned LLMs against LSTM and BERT baselines. The best fine-tuned model leads on every metrics, with an overall accuracy of 0.958 and macro-F1 of 0.930, while a compact 3B variant remains competitive, demonstrating the superior performance and data efficiency of pretrained transformers under constrained supervision. Applied at scale, the framework quantifies climate-related patterns in Texas. By frequency, Natech incidents form ∼6 % of statewide records, with counts surging during extreme years (hurricanes in 2005, 2008 and 2017; winter freeze in 2021). By excessive emissions, Natech contributions ∼10 % statewide and ∼14 % in coastal Texas; along the coast, hurricanes dominate and yield a disproportionately large share of Natech releases. The framework delivers single-pass, structured analytics that reduce manual effort and improve reproducibility, providing decision-ready evidence for emergency preparedness and mitigation. Looking ahead, coupling the model with retrieval-grounded weather data and human-in-the-loop audits could enable a production-grade Natech analytics agent for continuous monitoring and planning.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Process Safety and Environmental Protection
Process Safety and Environmental Protection 环境科学-工程:化工
CiteScore
11.40
自引率
15.40%
发文量
929
审稿时长
8.0 months
期刊介绍: The Process Safety and Environmental Protection (PSEP) journal is a leading international publication that focuses on the publication of high-quality, original research papers in the field of engineering, specifically those related to the safety of industrial processes and environmental protection. The journal encourages submissions that present new developments in safety and environmental aspects, particularly those that show how research findings can be applied in process engineering design and practice. PSEP is particularly interested in research that brings fresh perspectives to established engineering principles, identifies unsolved problems, or suggests directions for future research. The journal also values contributions that push the boundaries of traditional engineering and welcomes multidisciplinary papers. PSEP's articles are abstracted and indexed by a range of databases and services, which helps to ensure that the journal's research is accessible and recognized in the academic and professional communities. These databases include ANTE, Chemical Abstracts, Chemical Hazards in Industry, Current Contents, Elsevier Engineering Information database, Pascal Francis, Web of Science, Scopus, Engineering Information Database EnCompass LIT (Elsevier), and INSPEC. This wide coverage facilitates the dissemination of the journal's content to a global audience interested in process safety and environmental engineering.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信