IMPACT-4CCS:基于从头算和训练势的碰撞截面综合建模和预测

IF 3.4 3区 化学 Q2 CHEMISTRY, MULTIDISCIPLINARY
Carson Farmer, Hector Medina
{"title":"IMPACT-4CCS:基于从头算和训练势的碰撞截面综合建模和预测","authors":"Carson Farmer,&nbsp;Hector Medina","doi":"10.1002/jcc.70106","DOIUrl":null,"url":null,"abstract":"<p>Collision cross section (CCS) values can enhance the identification and classification of molecular contaminants such as per- and polyfluororoalkyl substances (PFAS). However, the computational burden required for large molecules, combined with the increasing number of potential PFAS candidates, can render existing methods incapable of providing sufficiently accurate results in a timely manner. Furthermore, machine learning methods struggle to generalize when the (de)protonated structure undergoes structural changes that are not common in the training dataset. In this study, we introduce IMPACT4-CCS (Integrated Modeling and Prediction using Ab initio and Trained potentials for Collision Cross Section), a novel computational workflow ensemble that comprises ab initio with machine learning tasks to accelerate accurate prediction of CCS for PFAS molecules. IMPACT-4CCS achieves comparable accuracy to current machine learning approaches, as validated using a test set of 100 molecules. Furthermore, IMPACT-4CCS exhibits better accuracy when implemented on some specific emerging PFAS subclasses, such as the <i>n</i>H-perfluoroalkyl carboxylic acids (<i>n</i>H-PFCA) family, for which other methods overestimate their CCS values. As far as the authors know, IMPACT-4CCS is the only existing method capable of capturing structural dynamics (i.e., hydrogen bridging) present in some large and flexible PFAS molecules. Our work demonstrates that the careful use of machine learning to accelerate traditional methods is likely to be more accurate than relying purely on machine learning on molecular graphs. Future (or recommended) work includes assessing the usefulness of IMPACT-4CCS for extending nontarget analysis to larger PFAS datasets such as the OECD (Organization for Economic Co-operation and Development) PFAS list in PubChem, which could be greater than 7 million molecules with diverse chemistry.</p>","PeriodicalId":188,"journal":{"name":"Journal of Computational Chemistry","volume":"46 11","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jcc.70106","citationCount":"0","resultStr":"{\"title\":\"IMPACT-4CCS: Integrated Modeling and Prediction Using Ab Initio and Trained Potentials for Collision Cross Sections\",\"authors\":\"Carson Farmer,&nbsp;Hector Medina\",\"doi\":\"10.1002/jcc.70106\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Collision cross section (CCS) values can enhance the identification and classification of molecular contaminants such as per- and polyfluororoalkyl substances (PFAS). However, the computational burden required for large molecules, combined with the increasing number of potential PFAS candidates, can render existing methods incapable of providing sufficiently accurate results in a timely manner. Furthermore, machine learning methods struggle to generalize when the (de)protonated structure undergoes structural changes that are not common in the training dataset. In this study, we introduce IMPACT4-CCS (Integrated Modeling and Prediction using Ab initio and Trained potentials for Collision Cross Section), a novel computational workflow ensemble that comprises ab initio with machine learning tasks to accelerate accurate prediction of CCS for PFAS molecules. IMPACT-4CCS achieves comparable accuracy to current machine learning approaches, as validated using a test set of 100 molecules. Furthermore, IMPACT-4CCS exhibits better accuracy when implemented on some specific emerging PFAS subclasses, such as the <i>n</i>H-perfluoroalkyl carboxylic acids (<i>n</i>H-PFCA) family, for which other methods overestimate their CCS values. As far as the authors know, IMPACT-4CCS is the only existing method capable of capturing structural dynamics (i.e., hydrogen bridging) present in some large and flexible PFAS molecules. Our work demonstrates that the careful use of machine learning to accelerate traditional methods is likely to be more accurate than relying purely on machine learning on molecular graphs. Future (or recommended) work includes assessing the usefulness of IMPACT-4CCS for extending nontarget analysis to larger PFAS datasets such as the OECD (Organization for Economic Co-operation and Development) PFAS list in PubChem, which could be greater than 7 million molecules with diverse chemistry.</p>\",\"PeriodicalId\":188,\"journal\":{\"name\":\"Journal of Computational Chemistry\",\"volume\":\"46 11\",\"pages\":\"\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-04-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jcc.70106\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computational Chemistry\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/jcc.70106\",\"RegionNum\":3,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Chemistry","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jcc.70106","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

碰撞截面 (CCS) 值可以提高全氟和多氟烷基物质 (PFAS) 等分子污染物的识别和分类能力。然而,大分子所需的计算负担,再加上潜在 PFAS 候选物质数量的不断增加,使得现有方法无法及时提供足够准确的结果。此外,当(去)质子化结构发生了在训练数据集中并不常见的结构变化时,机器学习方法也很难泛化。在本研究中,我们介绍了 IMPACT4-CCS(利用 Ab initio 和训练电位对碰撞截面进行综合建模和预测),这是一种新型的计算工作流程组合,其中包括 ab initio 和机器学习任务,以加快对 PFAS 分子 CCS 的准确预测。通过使用 100 个分子的测试集进行验证,IMPACT-4CCS 实现了与当前机器学习方法相当的准确性。此外,IMPACT-4CCS 在某些特定的新兴 PFAS 子类(如 nH-perfluoroalkyl carboxylic acids (nH-PFCA) 家族)上的应用表现出更高的准确性,而其他方法会高估这些子类的 CCS 值。据作者所知,IMPACT-4CCS 是目前唯一一种能够捕捉某些大型柔性全氟辛烷磺酸分子结构动态(即氢桥)的方法。我们的工作表明,谨慎使用机器学习来加速传统方法可能比单纯依赖分子图的机器学习更准确。未来(或建议)的工作包括评估 IMPACT-4CCS 在将非目标分析扩展到更大的全氟辛烷磺酸数据集方面的实用性,例如 PubChem 中的经合组织(OECD)全氟辛烷磺酸列表,该列表可能超过 700 万个具有不同化学性质的分子。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

IMPACT-4CCS: Integrated Modeling and Prediction Using Ab Initio and Trained Potentials for Collision Cross Sections

IMPACT-4CCS: Integrated Modeling and Prediction Using Ab Initio and Trained Potentials for Collision Cross Sections

Collision cross section (CCS) values can enhance the identification and classification of molecular contaminants such as per- and polyfluororoalkyl substances (PFAS). However, the computational burden required for large molecules, combined with the increasing number of potential PFAS candidates, can render existing methods incapable of providing sufficiently accurate results in a timely manner. Furthermore, machine learning methods struggle to generalize when the (de)protonated structure undergoes structural changes that are not common in the training dataset. In this study, we introduce IMPACT4-CCS (Integrated Modeling and Prediction using Ab initio and Trained potentials for Collision Cross Section), a novel computational workflow ensemble that comprises ab initio with machine learning tasks to accelerate accurate prediction of CCS for PFAS molecules. IMPACT-4CCS achieves comparable accuracy to current machine learning approaches, as validated using a test set of 100 molecules. Furthermore, IMPACT-4CCS exhibits better accuracy when implemented on some specific emerging PFAS subclasses, such as the nH-perfluoroalkyl carboxylic acids (nH-PFCA) family, for which other methods overestimate their CCS values. As far as the authors know, IMPACT-4CCS is the only existing method capable of capturing structural dynamics (i.e., hydrogen bridging) present in some large and flexible PFAS molecules. Our work demonstrates that the careful use of machine learning to accelerate traditional methods is likely to be more accurate than relying purely on machine learning on molecular graphs. Future (or recommended) work includes assessing the usefulness of IMPACT-4CCS for extending nontarget analysis to larger PFAS datasets such as the OECD (Organization for Economic Co-operation and Development) PFAS list in PubChem, which could be greater than 7 million molecules with diverse chemistry.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.60
自引率
3.30%
发文量
247
审稿时长
1.7 months
期刊介绍: This distinguished journal publishes articles concerned with all aspects of computational chemistry: analytical, biological, inorganic, organic, physical, and materials. The Journal of Computational Chemistry presents original research, contemporary developments in theory and methodology, and state-of-the-art applications. Computational areas that are featured in the journal include ab initio and semiempirical quantum mechanics, density functional theory, molecular mechanics, molecular dynamics, statistical mechanics, cheminformatics, biomolecular structure prediction, molecular design, and bioinformatics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信