Active learning for regression of structure–property mapping: the importance of sampling and representation†

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery Pub Date : 2024-08-12 DOI:10.1039/D4DD00073K

Hao Liu, Berkay Yucel, Baskar Ganapathysubramanian, Surya R. Kalidindi, Daniel Wheeler and Olga Wodo

{"title":"Active learning for regression of structure–property mapping: the importance of sampling and representation†","authors":"Hao Liu, Berkay Yucel, Baskar Ganapathysubramanian, Surya R. Kalidindi, Daniel Wheeler and Olga Wodo","doi":"10.1039/D4DD00073K","DOIUrl":null,"url":null,"abstract":"<p >Data-driven approaches now allow for systematic mappings from materials microstructures to materials properties. In particular, diverse data-driven approaches are available to establish mappings using varied microstructure representations, each posing different demands on the resources required to calibrate machine learning models. In this work, using active learning regression and iteratively increasing the data pool, three questions are explored: (a) what is the minimal subset of data required to train a predictive structure–property model with sufficient accuracy? (b) Is this minimal subset highly dependent on the sampling strategy managing the datapool? And (c) what is the cost associated with the model calibration? Using case studies with different types of microstructure (composite <em>vs.</em> spinodal), dimensionality (two- and three-dimensional), and properties (elastic and electronic), we explore these questions using two separate microstructure representations: graph-based descriptors derived from a graph representation of the microstructure and two-point correlation functions. This work demonstrates that as few as 5% of evaluations are required to calibrate robust data-driven structure–property maps when selections are made from a library of diverse microstructures. The findings show that both representations (graph-based descriptors and two-point correlation functions) can be effective with only a small quantity of property evaluations when combined with different active learning strategies. However, the dimensionality of the latent space differs substantially depending on the microstructure representation and active learning strategy.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 1997-2009"},"PeriodicalIF":6.2000,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00073k?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2024/dd/d4dd00073k","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Data-driven approaches now allow for systematic mappings from materials microstructures to materials properties. In particular, diverse data-driven approaches are available to establish mappings using varied microstructure representations, each posing different demands on the resources required to calibrate machine learning models. In this work, using active learning regression and iteratively increasing the data pool, three questions are explored: (a) what is the minimal subset of data required to train a predictive structure–property model with sufficient accuracy? (b) Is this minimal subset highly dependent on the sampling strategy managing the datapool? And (c) what is the cost associated with the model calibration? Using case studies with different types of microstructure (composite vs. spinodal), dimensionality (two- and three-dimensional), and properties (elastic and electronic), we explore these questions using two separate microstructure representations: graph-based descriptors derived from a graph representation of the microstructure and two-point correlation functions. This work demonstrates that as few as 5% of evaluations are required to calibrate robust data-driven structure–property maps when selections are made from a library of diverse microstructures. The findings show that both representations (graph-based descriptors and two-point correlation functions) can be effective with only a small quantity of property evaluations when combined with different active learning strategies. However, the dimensionality of the latent space differs substantially depending on the microstructure representation and active learning strategy.

Abstract Image

查看原文本刊更多论文

结构-属性映射回归的主动学习：取样和表征的重要性

目前，数据驱动方法可实现从材料微观结构到材料特性的系统映射。特别是，有多种数据驱动方法可用于使用不同的微观结构表示法建立映射，每种方法都对校准机器学习模型所需的资源提出了不同的要求。在这项工作中，我们利用主动学习回归和迭代增加数据池的方法，探索了三个问题：(a) 以足够的准确性训练预测性结构-性能模型所需的最小数据子集是什么？(b) 这个最小子集是否高度依赖于管理数据池的采样策略？(c) 模型校准的相关成本是多少？通过对不同类型的微观结构（复合微观结构与尖晶石微观结构）、维度（二维与三维）和属性（弹性与电子）进行案例研究，评估了两种不同的微观结构表示方法：从微观结构图表示法和两点相关函数中得出的基于图形的描述符。这项研究表明，从不同的微观结构库中进行选择时，只需进行 5% 的评估即可校准稳健的数据驱动结构-属性图。研究结果表明，这两种表征（基于图形的描述符和两点相关函数）在与不同的主动学习策略相结合时，只需少量的属性评估就能产生效果。然而，根据微观结构表示法和主动学习策略的不同，潜在空间的维度也大不相同。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Digital discovery

CiteScore

2.80

自引率

0.00%

发文量