{"title":"CBERTaXGB:用于化学毒性和可燃性预测的特定领域变压器- xgboost混合动力车","authors":"Kunsen Lin, Boyang Liao, Xiaochuan Chen, Bohao Xu, Miaoya Jiang, Xuefei Zhou","doi":"10.1016/j.psep.2025.107615","DOIUrl":null,"url":null,"abstract":"Accurate prediction of chemical toxicity and flammability is essential for the effective implementation of the NFPA 704 Hazard Rating System, underpinning safety protocols in chemical lifecycle management. Traditional experimental methods, such as OECD 423 toxicity tests and ASTM E681 flammability assessments, face practical limitations due to extensive chemical inventories and continuous emergence of novel substances. This study introduces the CBERTaXGB framework, a hybrid model integrating Transformer-based molecular feature extraction (ChemBERTa) with XGBoost's optimized decision architecture, demonstrating exceptional predictive performance for organic hazardous chemicals. The model outperformed existing baseline models, including PR-AUC values of 0.994 and 0.996, AU-ROC of 0.971 and 0.923, and F1-scores of 0.972 and 0.996 for toxicity and flammability classification, respectively, while maintaining high precision and recall. Interpretability analysis confirmed CBERTaXGB's ability to identify critical molecular features such as aromatic stability patterns, electrophilic functional groups, and ester bond configurations, which align with established chemical mechanisms. t-SNE visualizations further validated distinct structural clusters in molecular embeddings. Application to environmental hazard screening of organic substances predicted hazardous properties in 1,319 chemicals, highlighting its practical utility in prioritizing chemicals for enhanced safety measures. CBERTaXGB thus offers a robust, interpretable, and regulatory-compliant approach, significantly advancing intelligent chemical hazard assessment and contributing to environmental protection and public health safety.","PeriodicalId":20743,"journal":{"name":"Process Safety and Environmental Protection","volume":"12 1","pages":""},"PeriodicalIF":6.9000,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CBERTaXGB: A Domain-Specific Transformer-XGBoost Hybrid for Chemical Toxicity and Flammability Prediction\",\"authors\":\"Kunsen Lin, Boyang Liao, Xiaochuan Chen, Bohao Xu, Miaoya Jiang, Xuefei Zhou\",\"doi\":\"10.1016/j.psep.2025.107615\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accurate prediction of chemical toxicity and flammability is essential for the effective implementation of the NFPA 704 Hazard Rating System, underpinning safety protocols in chemical lifecycle management. Traditional experimental methods, such as OECD 423 toxicity tests and ASTM E681 flammability assessments, face practical limitations due to extensive chemical inventories and continuous emergence of novel substances. This study introduces the CBERTaXGB framework, a hybrid model integrating Transformer-based molecular feature extraction (ChemBERTa) with XGBoost's optimized decision architecture, demonstrating exceptional predictive performance for organic hazardous chemicals. The model outperformed existing baseline models, including PR-AUC values of 0.994 and 0.996, AU-ROC of 0.971 and 0.923, and F1-scores of 0.972 and 0.996 for toxicity and flammability classification, respectively, while maintaining high precision and recall. Interpretability analysis confirmed CBERTaXGB's ability to identify critical molecular features such as aromatic stability patterns, electrophilic functional groups, and ester bond configurations, which align with established chemical mechanisms. t-SNE visualizations further validated distinct structural clusters in molecular embeddings. Application to environmental hazard screening of organic substances predicted hazardous properties in 1,319 chemicals, highlighting its practical utility in prioritizing chemicals for enhanced safety measures. CBERTaXGB thus offers a robust, interpretable, and regulatory-compliant approach, significantly advancing intelligent chemical hazard assessment and contributing to environmental protection and public health safety.\",\"PeriodicalId\":20743,\"journal\":{\"name\":\"Process Safety and Environmental Protection\",\"volume\":\"12 1\",\"pages\":\"\"},\"PeriodicalIF\":6.9000,\"publicationDate\":\"2025-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Process Safety and Environmental Protection\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://doi.org/10.1016/j.psep.2025.107615\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, CHEMICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Process Safety and Environmental Protection","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1016/j.psep.2025.107615","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}
CBERTaXGB: A Domain-Specific Transformer-XGBoost Hybrid for Chemical Toxicity and Flammability Prediction
Accurate prediction of chemical toxicity and flammability is essential for the effective implementation of the NFPA 704 Hazard Rating System, underpinning safety protocols in chemical lifecycle management. Traditional experimental methods, such as OECD 423 toxicity tests and ASTM E681 flammability assessments, face practical limitations due to extensive chemical inventories and continuous emergence of novel substances. This study introduces the CBERTaXGB framework, a hybrid model integrating Transformer-based molecular feature extraction (ChemBERTa) with XGBoost's optimized decision architecture, demonstrating exceptional predictive performance for organic hazardous chemicals. The model outperformed existing baseline models, including PR-AUC values of 0.994 and 0.996, AU-ROC of 0.971 and 0.923, and F1-scores of 0.972 and 0.996 for toxicity and flammability classification, respectively, while maintaining high precision and recall. Interpretability analysis confirmed CBERTaXGB's ability to identify critical molecular features such as aromatic stability patterns, electrophilic functional groups, and ester bond configurations, which align with established chemical mechanisms. t-SNE visualizations further validated distinct structural clusters in molecular embeddings. Application to environmental hazard screening of organic substances predicted hazardous properties in 1,319 chemicals, highlighting its practical utility in prioritizing chemicals for enhanced safety measures. CBERTaXGB thus offers a robust, interpretable, and regulatory-compliant approach, significantly advancing intelligent chemical hazard assessment and contributing to environmental protection and public health safety.
期刊介绍:
The Process Safety and Environmental Protection (PSEP) journal is a leading international publication that focuses on the publication of high-quality, original research papers in the field of engineering, specifically those related to the safety of industrial processes and environmental protection. The journal encourages submissions that present new developments in safety and environmental aspects, particularly those that show how research findings can be applied in process engineering design and practice.
PSEP is particularly interested in research that brings fresh perspectives to established engineering principles, identifies unsolved problems, or suggests directions for future research. The journal also values contributions that push the boundaries of traditional engineering and welcomes multidisciplinary papers.
PSEP's articles are abstracted and indexed by a range of databases and services, which helps to ensure that the journal's research is accessible and recognized in the academic and professional communities. These databases include ANTE, Chemical Abstracts, Chemical Hazards in Industry, Current Contents, Elsevier Engineering Information database, Pascal Francis, Web of Science, Scopus, Engineering Information Database EnCompass LIT (Elsevier), and INSPEC. This wide coverage facilitates the dissemination of the journal's content to a global audience interested in process safety and environmental engineering.