A study of the role of data and model uncertainty in active learning

IF 3.1 3区 材料科学 Q2 MATERIALS SCIENCE, MULTIDISCIPLINARY
Yahao Li , Errui Jiang , Ziqi Ni , Wudi Li , Ming Huang , Fengyuan Zhao , Fengqi Liu , Yicong Ye , Shuxin Bai
{"title":"A study of the role of data and model uncertainty in active learning","authors":"Yahao Li ,&nbsp;Errui Jiang ,&nbsp;Ziqi Ni ,&nbsp;Wudi Li ,&nbsp;Ming Huang ,&nbsp;Fengyuan Zhao ,&nbsp;Fengqi Liu ,&nbsp;Yicong Ye ,&nbsp;Shuxin Bai","doi":"10.1016/j.commatsci.2024.113512","DOIUrl":null,"url":null,"abstract":"<div><div>Uncertainty-based active learning strategies have demonstrated significant superiority in small data research of materials domain. This study explores the effects of model uncertainty and data uncertainty separately on the performance of active learning strategies, specifically focusing on the number of iterations required to identify the optimal samples. For model uncertainty, three kinds of acquisition functions are compared, including predicted value strategy (PV), ranking of predicted value strategy (PR) and expected improvement strategy (EI). Among these, the active learning model utilizing PR requires the fewest average iterations (1.75). For data uncertainty, we evaluate the iterations of active learning by Gaussian process models that incorporate the uncertainty of the observations and noise samples that takes account into the uncertainty of the input features respectively. The results indicate that the active learning iterations of the three strategies converge to similar at the optimal weighting when the uncertainty of the observations is considered in the model (EI for 1.75, PV for 1.21 and PR for 1.18). In contrast, incorporating noise samples into the augmented dataset after the original samples would severely deteriorate the efficiency of active learning recommendations. Our findings aim to offer guidance for exploring more favorable acquisition functions and methods for active learning strategies.</div></div>","PeriodicalId":10650,"journal":{"name":"Computational Materials Science","volume":"247 ","pages":"Article 113512"},"PeriodicalIF":3.1000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Materials Science","FirstCategoryId":"88","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S092702562400733X","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Uncertainty-based active learning strategies have demonstrated significant superiority in small data research of materials domain. This study explores the effects of model uncertainty and data uncertainty separately on the performance of active learning strategies, specifically focusing on the number of iterations required to identify the optimal samples. For model uncertainty, three kinds of acquisition functions are compared, including predicted value strategy (PV), ranking of predicted value strategy (PR) and expected improvement strategy (EI). Among these, the active learning model utilizing PR requires the fewest average iterations (1.75). For data uncertainty, we evaluate the iterations of active learning by Gaussian process models that incorporate the uncertainty of the observations and noise samples that takes account into the uncertainty of the input features respectively. The results indicate that the active learning iterations of the three strategies converge to similar at the optimal weighting when the uncertainty of the observations is considered in the model (EI for 1.75, PV for 1.21 and PR for 1.18). In contrast, incorporating noise samples into the augmented dataset after the original samples would severely deteriorate the efficiency of active learning recommendations. Our findings aim to offer guidance for exploring more favorable acquisition functions and methods for active learning strategies.

Abstract Image

数据和模型不确定性在主动学习中的作用研究
在材料领域的小数据研究中,基于不确定性的主动学习策略已显示出明显的优越性。本研究分别探讨了模型不确定性和数据不确定性对主动学习策略性能的影响,特别关注了识别最优样本所需的迭代次数。针对模型的不确定性,比较了三种获取函数,包括预测值策略(PV)、预测值排序策略(PR)和预期改进策略(EI)。其中,利用 PR 的主动学习模型所需的平均迭代次数最少(1.75 次)。对于数据的不确定性,我们通过高斯过程模型对主动学习的迭代进行了评估,该模型包含了观测数据的不确定性和噪声样本,分别考虑了输入特征的不确定性。结果表明,当模型中考虑到观测数据的不确定性时,三种策略的主动学习迭代收敛到相似的最优权重(EI 为 1.75,PV 为 1.21,PR 为 1.18)。相反,在原始样本之后将噪声样本纳入增强数据集会严重降低主动学习建议的效率。我们的研究结果旨在为探索更有利的主动学习策略获取函数和方法提供指导。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computational Materials Science
Computational Materials Science 工程技术-材料科学:综合
CiteScore
6.50
自引率
6.10%
发文量
665
审稿时长
26 days
期刊介绍: The goal of Computational Materials Science is to report on results that provide new or unique insights into, or significantly expand our understanding of, the properties of materials or phenomena associated with their design, synthesis, processing, characterization, and utilization. To be relevant to the journal, the results should be applied or applicable to specific material systems that are discussed within the submission.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信