Mohammed I. Radaideh , Matt Raymond , Paolo Elvati , Jacob C. Saldinger , Majdi I. Radaideh , Angela Violi
{"title":"Efficient sampling of polycyclic aromatic compounds for free energy predictions through active learning","authors":"Mohammed I. Radaideh , Matt Raymond , Paolo Elvati , Jacob C. Saldinger , Majdi I. Radaideh , Angela Violi","doi":"10.1016/j.egyai.2025.100528","DOIUrl":null,"url":null,"abstract":"<div><div>The physical growth of Polycyclic Aromatic Compounds (PACs) to soot particles plays a significant role in understanding the chemistry of soot formation. Insights into the process can be gained from PACs’ free energy of dimerization landscape. However, because the infeasibly large space of possible PAC dimers cannot be exhaustively simulated, researchers must train machine learning models on a subset of data to impute the rest. To this end, we propose and assess an active learning approach to discovering the optimal PACs for training a machine learning model to predict PACs’ association and dissociation free energies. The comparison between active learning and random sampling showed that active learning has faster loss convergence, requiring fewer training samples to reach the same level of accuracy. The trained model accurately modeled unseen PACs and exhibited robustness against changes in the sampling space used to train the model. More broadly, this work shows how active learning can optimize the design and improve the understanding of more expensive models in specific domains.</div></div>","PeriodicalId":34138,"journal":{"name":"Energy and AI","volume":"21 ","pages":"Article 100528"},"PeriodicalIF":9.6000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Energy and AI","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666546825000606","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The physical growth of Polycyclic Aromatic Compounds (PACs) to soot particles plays a significant role in understanding the chemistry of soot formation. Insights into the process can be gained from PACs’ free energy of dimerization landscape. However, because the infeasibly large space of possible PAC dimers cannot be exhaustively simulated, researchers must train machine learning models on a subset of data to impute the rest. To this end, we propose and assess an active learning approach to discovering the optimal PACs for training a machine learning model to predict PACs’ association and dissociation free energies. The comparison between active learning and random sampling showed that active learning has faster loss convergence, requiring fewer training samples to reach the same level of accuracy. The trained model accurately modeled unseen PACs and exhibited robustness against changes in the sampling space used to train the model. More broadly, this work shows how active learning can optimize the design and improve the understanding of more expensive models in specific domains.