Uncertainty quantification driven machine learning for improving model accuracy in imbalanced regression tasks

IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Tuba Dolar, Jie Chen, Wei Chen
{"title":"Uncertainty quantification driven machine learning for improving model accuracy in imbalanced regression tasks","authors":"Tuba Dolar,&nbsp;Jie Chen,&nbsp;Wei Chen","doi":"10.1016/j.eswa.2024.125526","DOIUrl":null,"url":null,"abstract":"<div><div>Several factors are known to determine the quality of machine learning models, one of which is the dataset quality. One problem related to the quality of a dataset is the imbalance issue. An imbalanced dataset contains significantly more data points for certain values of the output variable which increases the overfitting risk and negatively affects the prediction accuracy. In this article, we propose using epistemic uncertainty quantification (UQ) of machine learning models to identify rare samples in imbalanced regression problems for balancing the dataset. The developed algorithm, uncertainty quantification-driven imbalanced regression (UQDIR), is guided by UQ to restructure the training set with an adequate weight function using existent samples, eliminating the need for new data collection. After identifying rare samples with UQ, the algorithm selects a sample from the training set, assigns a resampling weight using the new weight function, and finally resamples the selected sample according to its assigned weight. We test UQDIR on several benchmark datasets and different machine learning algorithms, then compare its performance with similar imbalanced regression methods. A metamaterial design problem application is also provided for demonstrating the effectiveness of the algorithm in real-world scenarios. We show that improving the quality of UQ metrics results in improved model accuracy.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"261 ","pages":"Article 125526"},"PeriodicalIF":7.5000,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417424023935","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Several factors are known to determine the quality of machine learning models, one of which is the dataset quality. One problem related to the quality of a dataset is the imbalance issue. An imbalanced dataset contains significantly more data points for certain values of the output variable which increases the overfitting risk and negatively affects the prediction accuracy. In this article, we propose using epistemic uncertainty quantification (UQ) of machine learning models to identify rare samples in imbalanced regression problems for balancing the dataset. The developed algorithm, uncertainty quantification-driven imbalanced regression (UQDIR), is guided by UQ to restructure the training set with an adequate weight function using existent samples, eliminating the need for new data collection. After identifying rare samples with UQ, the algorithm selects a sample from the training set, assigns a resampling weight using the new weight function, and finally resamples the selected sample according to its assigned weight. We test UQDIR on several benchmark datasets and different machine learning algorithms, then compare its performance with similar imbalanced regression methods. A metamaterial design problem application is also provided for demonstrating the effectiveness of the algorithm in real-world scenarios. We show that improving the quality of UQ metrics results in improved model accuracy.
不确定性量化驱动机器学习,提高不平衡回归任务中的模型准确性
众所周知,有几个因素可以决定机器学习模型的质量,其中之一就是数据集的质量。与数据集质量有关的一个问题是不平衡问题。不平衡数据集包含的输出变量某些值的数据点明显较多,这会增加过拟合风险,并对预测准确性产生负面影响。在本文中,我们建议使用机器学习模型的认识不确定性量化(UQ)来识别不平衡回归问题中的稀有样本,以平衡数据集。所开发的算法--不确定性量化驱动的不平衡回归(UQDIR)--在 UQ 的指导下,利用现有样本以适当的权重函数重组训练集,从而无需收集新数据。利用 UQ 识别稀有样本后,算法从训练集中选择一个样本,使用新的权重函数分配重新采样权重,最后根据分配的权重对所选样本进行重新采样。我们在多个基准数据集和不同的机器学习算法上测试了 UQDIR,然后将其性能与类似的不平衡回归方法进行了比较。我们还提供了一个超材料设计问题应用,以展示该算法在现实世界中的有效性。我们发现,提高 UQ 指标的质量可以提高模型的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Expert Systems with Applications
Expert Systems with Applications 工程技术-工程:电子与电气
CiteScore
13.80
自引率
10.60%
发文量
2045
审稿时长
8.7 months
期刊介绍: Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信