迈向准确的多环芳烃红外光谱预测:处理电荷效应与经典和深度学习模型

IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL
Babken G. Beglaryan, Aleksandr S. Zakuskin, Viktor A. Nemchenko and Timur A. Labutin*, 
{"title":"迈向准确的多环芳烃红外光谱预测:处理电荷效应与经典和深度学习模型","authors":"Babken G. Beglaryan,&nbsp;Aleksandr S. Zakuskin,&nbsp;Viktor A. Nemchenko and Timur A. Labutin*,&nbsp;","doi":"10.1021/acs.jcim.5c0037210.1021/acs.jcim.5c00372","DOIUrl":null,"url":null,"abstract":"<p >Polycyclic aromatic hydrocarbons (PAHs) play a crucial role in astrochemistry, environmental studies, and combustion chemistry, yet interpreting their infrared (IR) spectra remains challenging due to the similarity of spectral features of many molecules. The presumable presence of both neutral and charged PAHs in mixtures complicates spectra interpretation, too. While first-principle calculations provide accurate spectral predictions, their high computational cost limits scalability. This study employs machine learning (ML) to predict PAH IR spectra, emphasizing the applicability of the developed models simultaneously for neutral and ionized molecules. Two models are introduced: an XGBoost model trained on Morgan fingerprints and a graph neural network (GNN) that employs molecular graph representations. Molecular charges are treated by incorporating their one-hot or learnable NN encodings to molecular representations. Both models demonstrate excellent predictive capabilities, for the first time enabling fast and accurate prediction of charged PAHs IR spectra. While the XGBoost model demonstrates the highest accuracy achieved to date, the GNN shows significant promise for future advancements due to the inherent capabilities of molecular graph representations. Remaining challenges, such as the scarcity of data on heteroatomic PAHs, and potential approaches of addressing them are also discussed in the manuscript.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 10","pages":"4854–4865 4854–4865"},"PeriodicalIF":5.3000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Toward Accurate PAH IR Spectra Prediction: Handling Charge Effects with Classical and Deep Learning Models\",\"authors\":\"Babken G. Beglaryan,&nbsp;Aleksandr S. Zakuskin,&nbsp;Viktor A. Nemchenko and Timur A. Labutin*,&nbsp;\",\"doi\":\"10.1021/acs.jcim.5c0037210.1021/acs.jcim.5c00372\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >Polycyclic aromatic hydrocarbons (PAHs) play a crucial role in astrochemistry, environmental studies, and combustion chemistry, yet interpreting their infrared (IR) spectra remains challenging due to the similarity of spectral features of many molecules. The presumable presence of both neutral and charged PAHs in mixtures complicates spectra interpretation, too. While first-principle calculations provide accurate spectral predictions, their high computational cost limits scalability. This study employs machine learning (ML) to predict PAH IR spectra, emphasizing the applicability of the developed models simultaneously for neutral and ionized molecules. Two models are introduced: an XGBoost model trained on Morgan fingerprints and a graph neural network (GNN) that employs molecular graph representations. Molecular charges are treated by incorporating their one-hot or learnable NN encodings to molecular representations. Both models demonstrate excellent predictive capabilities, for the first time enabling fast and accurate prediction of charged PAHs IR spectra. While the XGBoost model demonstrates the highest accuracy achieved to date, the GNN shows significant promise for future advancements due to the inherent capabilities of molecular graph representations. Remaining challenges, such as the scarcity of data on heteroatomic PAHs, and potential approaches of addressing them are also discussed in the manuscript.</p>\",\"PeriodicalId\":44,\"journal\":{\"name\":\"Journal of Chemical Information and Modeling \",\"volume\":\"65 10\",\"pages\":\"4854–4865 4854–4865\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemical Information and Modeling \",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://pubs.acs.org/doi/10.1021/acs.jcim.5c00372\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MEDICINAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.jcim.5c00372","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0

摘要

多环芳烃(PAHs)在天体化学、环境研究和燃烧化学中发挥着至关重要的作用,但由于许多分子的光谱特征相似,对其红外光谱的解释仍然具有挑战性。混合物中可能同时存在中性和带电多环芳烃,这也使光谱解释变得复杂。虽然第一性原理计算提供了准确的光谱预测,但它们的高计算成本限制了可扩展性。本研究采用机器学习(ML)来预测多环芳烃红外光谱,强调所建立的模型同时适用于中性和电离分子。介绍了两种模型:基于摩根指纹训练的XGBoost模型和采用分子图表示的图神经网络(GNN)。分子电荷通过将其单热或可学习的神经网络编码纳入分子表示来处理。两种模型均表现出出色的预测能力,首次实现了对带电多环芳烃红外光谱的快速准确预测。虽然XGBoost模型展示了迄今为止达到的最高精度,但由于分子图表示的固有能力,GNN在未来的进步中表现出了巨大的希望。剩余的挑战,如缺乏关于杂原子多环芳烃的数据,以及解决这些问题的潜在方法也在手稿中进行了讨论。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Toward Accurate PAH IR Spectra Prediction: Handling Charge Effects with Classical and Deep Learning Models

Toward Accurate PAH IR Spectra Prediction: Handling Charge Effects with Classical and Deep Learning Models

Polycyclic aromatic hydrocarbons (PAHs) play a crucial role in astrochemistry, environmental studies, and combustion chemistry, yet interpreting their infrared (IR) spectra remains challenging due to the similarity of spectral features of many molecules. The presumable presence of both neutral and charged PAHs in mixtures complicates spectra interpretation, too. While first-principle calculations provide accurate spectral predictions, their high computational cost limits scalability. This study employs machine learning (ML) to predict PAH IR spectra, emphasizing the applicability of the developed models simultaneously for neutral and ionized molecules. Two models are introduced: an XGBoost model trained on Morgan fingerprints and a graph neural network (GNN) that employs molecular graph representations. Molecular charges are treated by incorporating their one-hot or learnable NN encodings to molecular representations. Both models demonstrate excellent predictive capabilities, for the first time enabling fast and accurate prediction of charged PAHs IR spectra. While the XGBoost model demonstrates the highest accuracy achieved to date, the GNN shows significant promise for future advancements due to the inherent capabilities of molecular graph representations. Remaining challenges, such as the scarcity of data on heteroatomic PAHs, and potential approaches of addressing them are also discussed in the manuscript.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
9.80
自引率
10.70%
发文量
529
审稿时长
1.4 months
期刊介绍: The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信