CBERTaXGB: A Domain-Specific Transformer-XGBoost Hybrid for Chemical Toxicity and Flammability Prediction

IF 6.9 2区 环境科学与生态学 Q1 ENGINEERING, CHEMICAL
Kunsen Lin, Boyang Liao, Xiaochuan Chen, Bohao Xu, Miaoya Jiang, Xuefei Zhou
{"title":"CBERTaXGB: A Domain-Specific Transformer-XGBoost Hybrid for Chemical Toxicity and Flammability Prediction","authors":"Kunsen Lin, Boyang Liao, Xiaochuan Chen, Bohao Xu, Miaoya Jiang, Xuefei Zhou","doi":"10.1016/j.psep.2025.107615","DOIUrl":null,"url":null,"abstract":"Accurate prediction of chemical toxicity and flammability is essential for the effective implementation of the NFPA 704 Hazard Rating System, underpinning safety protocols in chemical lifecycle management. Traditional experimental methods, such as OECD 423 toxicity tests and ASTM E681 flammability assessments, face practical limitations due to extensive chemical inventories and continuous emergence of novel substances. This study introduces the CBERTaXGB framework, a hybrid model integrating Transformer-based molecular feature extraction (ChemBERTa) with XGBoost's optimized decision architecture, demonstrating exceptional predictive performance for organic hazardous chemicals. The model outperformed existing baseline models, including PR-AUC values of 0.994 and 0.996, AU-ROC of 0.971 and 0.923, and F1-scores of 0.972 and 0.996 for toxicity and flammability classification, respectively, while maintaining high precision and recall. Interpretability analysis confirmed CBERTaXGB's ability to identify critical molecular features such as aromatic stability patterns, electrophilic functional groups, and ester bond configurations, which align with established chemical mechanisms. t-SNE visualizations further validated distinct structural clusters in molecular embeddings. Application to environmental hazard screening of organic substances predicted hazardous properties in 1,319 chemicals, highlighting its practical utility in prioritizing chemicals for enhanced safety measures. CBERTaXGB thus offers a robust, interpretable, and regulatory-compliant approach, significantly advancing intelligent chemical hazard assessment and contributing to environmental protection and public health safety.","PeriodicalId":20743,"journal":{"name":"Process Safety and Environmental Protection","volume":"12 1","pages":""},"PeriodicalIF":6.9000,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Process Safety and Environmental Protection","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1016/j.psep.2025.107615","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Accurate prediction of chemical toxicity and flammability is essential for the effective implementation of the NFPA 704 Hazard Rating System, underpinning safety protocols in chemical lifecycle management. Traditional experimental methods, such as OECD 423 toxicity tests and ASTM E681 flammability assessments, face practical limitations due to extensive chemical inventories and continuous emergence of novel substances. This study introduces the CBERTaXGB framework, a hybrid model integrating Transformer-based molecular feature extraction (ChemBERTa) with XGBoost's optimized decision architecture, demonstrating exceptional predictive performance for organic hazardous chemicals. The model outperformed existing baseline models, including PR-AUC values of 0.994 and 0.996, AU-ROC of 0.971 and 0.923, and F1-scores of 0.972 and 0.996 for toxicity and flammability classification, respectively, while maintaining high precision and recall. Interpretability analysis confirmed CBERTaXGB's ability to identify critical molecular features such as aromatic stability patterns, electrophilic functional groups, and ester bond configurations, which align with established chemical mechanisms. t-SNE visualizations further validated distinct structural clusters in molecular embeddings. Application to environmental hazard screening of organic substances predicted hazardous properties in 1,319 chemicals, highlighting its practical utility in prioritizing chemicals for enhanced safety measures. CBERTaXGB thus offers a robust, interpretable, and regulatory-compliant approach, significantly advancing intelligent chemical hazard assessment and contributing to environmental protection and public health safety.
CBERTaXGB:用于化学毒性和可燃性预测的特定领域变压器- xgboost混合动力车
准确预测化学品的毒性和可燃性对于有效实施NFPA 704危险等级系统至关重要,是化学品生命周期管理安全协议的基础。传统的实验方法,如OECD 423毒性试验和ASTM E681可燃性评估,由于大量的化学品清单和不断出现的新物质,面临着实际的限制。本研究引入了CBERTaXGB框架,这是一个混合模型,将基于变压器的分子特征提取(ChemBERTa)与XGBoost优化的决策架构集成在一起,展示了对有机有害化学品的卓越预测性能。该模型的PR-AUC值分别为0.994和0.996,AU-ROC值分别为0.971和0.923,毒性和可燃性分类f1得分分别为0.972和0.996,优于现有的基线模型,同时保持了较高的精度和召回率。可解释性分析证实了CBERTaXGB识别关键分子特征的能力,如芳香稳定性模式、亲电官能团和酯键构型,这些特征与已建立的化学机制一致。t-SNE可视化进一步验证了分子嵌入中的不同结构簇。应用于有机物质环境危害筛选,预测了1319种化学物质的危险特性,突出了其在优先考虑化学物质以加强安全措施方面的实用性。因此,CBERTaXGB提供了一种强大的、可解释的和符合法规的方法,大大推进了智能化学品危害评估,并有助于环境保护和公共卫生安全。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Process Safety and Environmental Protection
Process Safety and Environmental Protection 环境科学-工程:化工
CiteScore
11.40
自引率
15.40%
发文量
929
审稿时长
8.0 months
期刊介绍: The Process Safety and Environmental Protection (PSEP) journal is a leading international publication that focuses on the publication of high-quality, original research papers in the field of engineering, specifically those related to the safety of industrial processes and environmental protection. The journal encourages submissions that present new developments in safety and environmental aspects, particularly those that show how research findings can be applied in process engineering design and practice. PSEP is particularly interested in research that brings fresh perspectives to established engineering principles, identifies unsolved problems, or suggests directions for future research. The journal also values contributions that push the boundaries of traditional engineering and welcomes multidisciplinary papers. PSEP's articles are abstracted and indexed by a range of databases and services, which helps to ensure that the journal's research is accessible and recognized in the academic and professional communities. These databases include ANTE, Chemical Abstracts, Chemical Hazards in Industry, Current Contents, Elsevier Engineering Information database, Pascal Francis, Web of Science, Scopus, Engineering Information Database EnCompass LIT (Elsevier), and INSPEC. This wide coverage facilitates the dissemination of the journal's content to a global audience interested in process safety and environmental engineering.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信