{"title":"Machine learning-driven property prediction for materials in data-scarcity scenarios: Ensemble of experts approach","authors":"L.A. Miccio","doi":"10.1016/j.commatsci.2025.114092","DOIUrl":null,"url":null,"abstract":"<div><div>Data scarcity poses a significant challenge in the field of materials science, particularly for the accurate prediction of complex material properties such as the glass transition temperature or the Flory-Huggins interaction parameter in polymers. Traditional machine learning models struggle to generalize in data-limited scenarios due to the intricate, non-linear interactions between material components. The present study introduces an ensemble of experts (EE) approach to overcome these limitations by using expert models previously trained on datasets of different, but physically meaningful properties. The so obtained knowledge of these experts is then used to make accurate predictions on more complex systems, even with very limited training data. The approach utilizes tokenized SMILES strings to represent molecular structures, enhancing the model’s capacity to interpret chemical information compared to traditional one-hot encoding methods. The performance of the EE system is evaluated against standard ANNs in predicting Tg for molecular glass formers, binary mixtures, and χ for polymer–solvent systems under different data scarcity conditions. Results show that the EE framework significantly outperforms standard ANNs, achieving higher predictive accuracy and better generalization, particularly under extreme data scarcity. The EE system’s ability to effectively incorporate domain-specific chemical information makes it a powerful and scalable solution for predicting material properties, reducing the reliance on costly experimental data collection.</div></div>","PeriodicalId":10650,"journal":{"name":"Computational Materials Science","volume":"258 ","pages":"Article 114092"},"PeriodicalIF":3.1000,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Materials Science","FirstCategoryId":"88","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0927025625004355","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Data scarcity poses a significant challenge in the field of materials science, particularly for the accurate prediction of complex material properties such as the glass transition temperature or the Flory-Huggins interaction parameter in polymers. Traditional machine learning models struggle to generalize in data-limited scenarios due to the intricate, non-linear interactions between material components. The present study introduces an ensemble of experts (EE) approach to overcome these limitations by using expert models previously trained on datasets of different, but physically meaningful properties. The so obtained knowledge of these experts is then used to make accurate predictions on more complex systems, even with very limited training data. The approach utilizes tokenized SMILES strings to represent molecular structures, enhancing the model’s capacity to interpret chemical information compared to traditional one-hot encoding methods. The performance of the EE system is evaluated against standard ANNs in predicting Tg for molecular glass formers, binary mixtures, and χ for polymer–solvent systems under different data scarcity conditions. Results show that the EE framework significantly outperforms standard ANNs, achieving higher predictive accuracy and better generalization, particularly under extreme data scarcity. The EE system’s ability to effectively incorporate domain-specific chemical information makes it a powerful and scalable solution for predicting material properties, reducing the reliance on costly experimental data collection.
期刊介绍:
The goal of Computational Materials Science is to report on results that provide new or unique insights into, or significantly expand our understanding of, the properties of materials or phenomena associated with their design, synthesis, processing, characterization, and utilization. To be relevant to the journal, the results should be applied or applicable to specific material systems that are discussed within the submission.