{"title":"Hybrid Unsupervised/Supervised Machine Learning for Identifying Molecular Structural Fingerprints From Ensemble Property","authors":"Arpan Choudhury, Debashree Ghosh","doi":"10.1002/jcc.70038","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>The ensemble properties of a system are obtained by averaging over the properties calculated for the various configurations it can have at a finite temperature and thus cannot be captured by a single molecular structure. Such ensemble properties are often important in material discovery. In designing new materials, the goal is to predict those ensemble structures that display a tailored property. However, mapping this average property to multiple structures introduces ambiguities and unreliable convergence in supervised machine learning. This presents a major obstacle in designing new materials. Here, we introduce a hybrid unsupervised/supervised learning method and demonstrate how to predict the structural parameters defining the conformers of a heterogeneous system, melanin, from its ensemble-averaged spectra. This also shows a new way to identify different structural fingerprints responsible for an ensemble-averaged superposition spectrum.</p>\n </div>","PeriodicalId":188,"journal":{"name":"Journal of Computational Chemistry","volume":"46 3","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Chemistry","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jcc.70038","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
The ensemble properties of a system are obtained by averaging over the properties calculated for the various configurations it can have at a finite temperature and thus cannot be captured by a single molecular structure. Such ensemble properties are often important in material discovery. In designing new materials, the goal is to predict those ensemble structures that display a tailored property. However, mapping this average property to multiple structures introduces ambiguities and unreliable convergence in supervised machine learning. This presents a major obstacle in designing new materials. Here, we introduce a hybrid unsupervised/supervised learning method and demonstrate how to predict the structural parameters defining the conformers of a heterogeneous system, melanin, from its ensemble-averaged spectra. This also shows a new way to identify different structural fingerprints responsible for an ensemble-averaged superposition spectrum.
期刊介绍:
This distinguished journal publishes articles concerned with all aspects of computational chemistry: analytical, biological, inorganic, organic, physical, and materials. The Journal of Computational Chemistry presents original research, contemporary developments in theory and methodology, and state-of-the-art applications. Computational areas that are featured in the journal include ab initio and semiempirical quantum mechanics, density functional theory, molecular mechanics, molecular dynamics, statistical mechanics, cheminformatics, biomolecular structure prediction, molecular design, and bioinformatics.