通过主动学习进行自由能预测的多环芳香族化合物的有效采样

IF 9.6 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Mohammed I. Radaideh , Matt Raymond , Paolo Elvati , Jacob C. Saldinger , Majdi I. Radaideh , Angela Violi
{"title":"通过主动学习进行自由能预测的多环芳香族化合物的有效采样","authors":"Mohammed I. Radaideh ,&nbsp;Matt Raymond ,&nbsp;Paolo Elvati ,&nbsp;Jacob C. Saldinger ,&nbsp;Majdi I. Radaideh ,&nbsp;Angela Violi","doi":"10.1016/j.egyai.2025.100528","DOIUrl":null,"url":null,"abstract":"<div><div>The physical growth of Polycyclic Aromatic Compounds (PACs) to soot particles plays a significant role in understanding the chemistry of soot formation. Insights into the process can be gained from PACs’ free energy of dimerization landscape. However, because the infeasibly large space of possible PAC dimers cannot be exhaustively simulated, researchers must train machine learning models on a subset of data to impute the rest. To this end, we propose and assess an active learning approach to discovering the optimal PACs for training a machine learning model to predict PACs’ association and dissociation free energies. The comparison between active learning and random sampling showed that active learning has faster loss convergence, requiring fewer training samples to reach the same level of accuracy. The trained model accurately modeled unseen PACs and exhibited robustness against changes in the sampling space used to train the model. More broadly, this work shows how active learning can optimize the design and improve the understanding of more expensive models in specific domains.</div></div>","PeriodicalId":34138,"journal":{"name":"Energy and AI","volume":"21 ","pages":"Article 100528"},"PeriodicalIF":9.6000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient sampling of polycyclic aromatic compounds for free energy predictions through active learning\",\"authors\":\"Mohammed I. Radaideh ,&nbsp;Matt Raymond ,&nbsp;Paolo Elvati ,&nbsp;Jacob C. Saldinger ,&nbsp;Majdi I. Radaideh ,&nbsp;Angela Violi\",\"doi\":\"10.1016/j.egyai.2025.100528\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The physical growth of Polycyclic Aromatic Compounds (PACs) to soot particles plays a significant role in understanding the chemistry of soot formation. Insights into the process can be gained from PACs’ free energy of dimerization landscape. However, because the infeasibly large space of possible PAC dimers cannot be exhaustively simulated, researchers must train machine learning models on a subset of data to impute the rest. To this end, we propose and assess an active learning approach to discovering the optimal PACs for training a machine learning model to predict PACs’ association and dissociation free energies. The comparison between active learning and random sampling showed that active learning has faster loss convergence, requiring fewer training samples to reach the same level of accuracy. The trained model accurately modeled unseen PACs and exhibited robustness against changes in the sampling space used to train the model. More broadly, this work shows how active learning can optimize the design and improve the understanding of more expensive models in specific domains.</div></div>\",\"PeriodicalId\":34138,\"journal\":{\"name\":\"Energy and AI\",\"volume\":\"21 \",\"pages\":\"Article 100528\"},\"PeriodicalIF\":9.6000,\"publicationDate\":\"2025-06-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Energy and AI\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666546825000606\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Energy and AI","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666546825000606","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

多环芳香族化合物(PACs)在烟灰颗粒中的物理生长对理解烟灰形成的化学过程具有重要意义。从PACs二聚化景观的自由能中可以了解这一过程。然而,由于可能的PAC二聚体的不可行的大空间无法完全模拟,研究人员必须在数据子集上训练机器学习模型来推算其余部分。为此,我们提出并评估了一种主动学习方法来发现最佳pac,用于训练机器学习模型来预测pac的关联和解离自由能。主动学习与随机抽样的比较表明,主动学习具有更快的损失收敛速度,需要更少的训练样本才能达到相同的精度水平。训练后的模型准确地模拟了看不见的pac,并对用于训练模型的采样空间的变化表现出鲁棒性。更广泛地说,这项工作显示了主动学习如何优化设计并提高对特定领域中更昂贵模型的理解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Efficient sampling of polycyclic aromatic compounds for free energy predictions through active learning

Efficient sampling of polycyclic aromatic compounds for free energy predictions through active learning
The physical growth of Polycyclic Aromatic Compounds (PACs) to soot particles plays a significant role in understanding the chemistry of soot formation. Insights into the process can be gained from PACs’ free energy of dimerization landscape. However, because the infeasibly large space of possible PAC dimers cannot be exhaustively simulated, researchers must train machine learning models on a subset of data to impute the rest. To this end, we propose and assess an active learning approach to discovering the optimal PACs for training a machine learning model to predict PACs’ association and dissociation free energies. The comparison between active learning and random sampling showed that active learning has faster loss convergence, requiring fewer training samples to reach the same level of accuracy. The trained model accurately modeled unseen PACs and exhibited robustness against changes in the sampling space used to train the model. More broadly, this work shows how active learning can optimize the design and improve the understanding of more expensive models in specific domains.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Energy and AI
Energy and AI Engineering-Engineering (miscellaneous)
CiteScore
16.50
自引率
0.00%
发文量
64
审稿时长
56 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信