{"title":"IMPACT-4CCS: Integrated Modeling and Prediction Using Ab Initio and Trained Potentials for Collision Cross Sections","authors":"Carson Farmer, Hector Medina","doi":"10.1002/jcc.70106","DOIUrl":null,"url":null,"abstract":"<p>Collision cross section (CCS) values can enhance the identification and classification of molecular contaminants such as per- and polyfluororoalkyl substances (PFAS). However, the computational burden required for large molecules, combined with the increasing number of potential PFAS candidates, can render existing methods incapable of providing sufficiently accurate results in a timely manner. Furthermore, machine learning methods struggle to generalize when the (de)protonated structure undergoes structural changes that are not common in the training dataset. In this study, we introduce IMPACT4-CCS (Integrated Modeling and Prediction using Ab initio and Trained potentials for Collision Cross Section), a novel computational workflow ensemble that comprises ab initio with machine learning tasks to accelerate accurate prediction of CCS for PFAS molecules. IMPACT-4CCS achieves comparable accuracy to current machine learning approaches, as validated using a test set of 100 molecules. Furthermore, IMPACT-4CCS exhibits better accuracy when implemented on some specific emerging PFAS subclasses, such as the <i>n</i>H-perfluoroalkyl carboxylic acids (<i>n</i>H-PFCA) family, for which other methods overestimate their CCS values. As far as the authors know, IMPACT-4CCS is the only existing method capable of capturing structural dynamics (i.e., hydrogen bridging) present in some large and flexible PFAS molecules. Our work demonstrates that the careful use of machine learning to accelerate traditional methods is likely to be more accurate than relying purely on machine learning on molecular graphs. Future (or recommended) work includes assessing the usefulness of IMPACT-4CCS for extending nontarget analysis to larger PFAS datasets such as the OECD (Organization for Economic Co-operation and Development) PFAS list in PubChem, which could be greater than 7 million molecules with diverse chemistry.</p>","PeriodicalId":188,"journal":{"name":"Journal of Computational Chemistry","volume":"46 11","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jcc.70106","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Chemistry","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jcc.70106","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Collision cross section (CCS) values can enhance the identification and classification of molecular contaminants such as per- and polyfluororoalkyl substances (PFAS). However, the computational burden required for large molecules, combined with the increasing number of potential PFAS candidates, can render existing methods incapable of providing sufficiently accurate results in a timely manner. Furthermore, machine learning methods struggle to generalize when the (de)protonated structure undergoes structural changes that are not common in the training dataset. In this study, we introduce IMPACT4-CCS (Integrated Modeling and Prediction using Ab initio and Trained potentials for Collision Cross Section), a novel computational workflow ensemble that comprises ab initio with machine learning tasks to accelerate accurate prediction of CCS for PFAS molecules. IMPACT-4CCS achieves comparable accuracy to current machine learning approaches, as validated using a test set of 100 molecules. Furthermore, IMPACT-4CCS exhibits better accuracy when implemented on some specific emerging PFAS subclasses, such as the nH-perfluoroalkyl carboxylic acids (nH-PFCA) family, for which other methods overestimate their CCS values. As far as the authors know, IMPACT-4CCS is the only existing method capable of capturing structural dynamics (i.e., hydrogen bridging) present in some large and flexible PFAS molecules. Our work demonstrates that the careful use of machine learning to accelerate traditional methods is likely to be more accurate than relying purely on machine learning on molecular graphs. Future (or recommended) work includes assessing the usefulness of IMPACT-4CCS for extending nontarget analysis to larger PFAS datasets such as the OECD (Organization for Economic Co-operation and Development) PFAS list in PubChem, which could be greater than 7 million molecules with diverse chemistry.
期刊介绍:
This distinguished journal publishes articles concerned with all aspects of computational chemistry: analytical, biological, inorganic, organic, physical, and materials. The Journal of Computational Chemistry presents original research, contemporary developments in theory and methodology, and state-of-the-art applications. Computational areas that are featured in the journal include ab initio and semiempirical quantum mechanics, density functional theory, molecular mechanics, molecular dynamics, statistical mechanics, cheminformatics, biomolecular structure prediction, molecular design, and bioinformatics.