Machine learning-driven property prediction for materials in data-scarcity scenarios: Ensemble of experts approach

IF 3.1 3区材料科学 Q2 MATERIALS SCIENCE, MULTIDISCIPLINARY

Computational Materials Science Pub Date : 2025-07-04 DOI:10.1016/j.commatsci.2025.114092

L.A. Miccio

{"title":"Machine learning-driven property prediction for materials in data-scarcity scenarios: Ensemble of experts approach","authors":"L.A. Miccio","doi":"10.1016/j.commatsci.2025.114092","DOIUrl":null,"url":null,"abstract":"<div><div>Data scarcity poses a significant challenge in the field of materials science, particularly for the accurate prediction of complex material properties such as the glass transition temperature or the Flory-Huggins interaction parameter in polymers. Traditional machine learning models struggle to generalize in data-limited scenarios due to the intricate, non-linear interactions between material components. The present study introduces an ensemble of experts (EE) approach to overcome these limitations by using expert models previously trained on datasets of different, but physically meaningful properties. The so obtained knowledge of these experts is then used to make accurate predictions on more complex systems, even with very limited training data. The approach utilizes tokenized SMILES strings to represent molecular structures, enhancing the model’s capacity to interpret chemical information compared to traditional one-hot encoding methods. The performance of the EE system is evaluated against standard ANNs in predicting Tg for molecular glass formers, binary mixtures, and χ for polymer–solvent systems under different data scarcity conditions. Results show that the EE framework significantly outperforms standard ANNs, achieving higher predictive accuracy and better generalization, particularly under extreme data scarcity. The EE system’s ability to effectively incorporate domain-specific chemical information makes it a powerful and scalable solution for predicting material properties, reducing the reliance on costly experimental data collection.</div></div>","PeriodicalId":10650,"journal":{"name":"Computational Materials Science","volume":"258 ","pages":"Article 114092"},"PeriodicalIF":3.1000,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Materials Science","FirstCategoryId":"88","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0927025625004355","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Data scarcity poses a significant challenge in the field of materials science, particularly for the accurate prediction of complex material properties such as the glass transition temperature or the Flory-Huggins interaction parameter in polymers. Traditional machine learning models struggle to generalize in data-limited scenarios due to the intricate, non-linear interactions between material components. The present study introduces an ensemble of experts (EE) approach to overcome these limitations by using expert models previously trained on datasets of different, but physically meaningful properties. The so obtained knowledge of these experts is then used to make accurate predictions on more complex systems, even with very limited training data. The approach utilizes tokenized SMILES strings to represent molecular structures, enhancing the model’s capacity to interpret chemical information compared to traditional one-hot encoding methods. The performance of the EE system is evaluated against standard ANNs in predicting Tg for molecular glass formers, binary mixtures, and χ for polymer–solvent systems under different data scarcity conditions. Results show that the EE framework significantly outperforms standard ANNs, achieving higher predictive accuracy and better generalization, particularly under extreme data scarcity. The EE system’s ability to effectively incorporate domain-specific chemical information makes it a powerful and scalable solution for predicting material properties, reducing the reliance on costly experimental data collection.

Abstract Image

查看原文本刊更多论文

数据稀缺情况下材料的机器学习驱动属性预测：专家集成方法

数据稀缺对材料科学领域提出了重大挑战，特别是对于复杂材料特性的准确预测，如玻璃化转变温度或聚合物中的Flory-Huggins相互作用参数。由于材料成分之间复杂的非线性相互作用，传统的机器学习模型难以在数据有限的情况下进行泛化。本研究引入了一种专家集合（EE）方法，通过使用以前在不同但物理上有意义的数据集上训练过的专家模型来克服这些限制。然后，这些专家获得的知识被用来对更复杂的系统做出准确的预测，即使训练数据非常有限。该方法利用标记化的SMILES字符串来表示分子结构，与传统的单热编码方法相比，增强了模型解释化学信息的能力。在不同的数据稀缺条件下，EE系统在预测分子玻璃形成物、二元混合物和聚合物-溶剂系统的χ方面的性能与标准人工神经网络进行了比较。结果表明，EE框架显著优于标准人工神经网络，实现了更高的预测精度和更好的泛化，特别是在极端数据稀缺的情况下。EE系统有效整合特定领域化学信息的能力，使其成为预测材料特性的强大且可扩展的解决方案，减少了对昂贵的实验数据收集的依赖。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational Materials Science 工程技术-材料科学：综合

CiteScore

6.50

自引率

6.10%

发文量

665

审稿时长

26 days

期刊介绍： The goal of Computational Materials Science is to report on results that provide new or unique insights into, or significantly expand our understanding of, the properties of materials or phenomena associated with their design, synthesis, processing, characterization, and utilization. To be relevant to the journal, the results should be applied or applicable to specific material systems that are discussed within the submission.