A novel and efficient risk minimisation-based missing value imputation algorithm

IF 4.4 2区 化学 Q2 MATERIALS SCIENCE, MULTIDISCIPLINARY
Yu-Lin He , Jia-Yin Yu , Xu Li , Philippe Fournier-Viger , Joshua Zhexue Huang
{"title":"A novel and efficient risk minimisation-based missing value imputation algorithm","authors":"Yu-Lin He ,&nbsp;Jia-Yin Yu ,&nbsp;Xu Li ,&nbsp;Philippe Fournier-Viger ,&nbsp;Joshua Zhexue Huang","doi":"10.1016/j.knosys.2024.112435","DOIUrl":null,"url":null,"abstract":"<div><p>Missing value imputation (MVI) is a key task in data science, in which learning models are built from incomplete data. In contrast to externally driven MVI algorithms, this study proposes a novel risk minimisation-based MVI algorithm (RM-MVI) that considers both the internal characteristics of missing data and the external performance for specific classification applications. RM-MVI is technically designed for labelled data and is applied in two stages: <em>filling</em> with structural risk minimisation (SRM) and <em>refining</em> with empirical risk minimisation (ERM). In the filling stage, an autoencoder with a single hidden layer is trained on the original dataset without missing values. Missing values are first initialised with random numbers, and the imputation values are then preliminarily optimised based on the derived updating rule to minimise the structural risk-oriented objective function. After the imputation values have been preliminarily optimised in the filling stage, a neural-network-based classifier is trained in the refining stage to optimise the imputation values sophisticatedly by reducing the empirical risk. Experiments were conducted on several benchmark datasets to validate the feasibility, rationality, and effectiveness of the proposed RM-MVI algorithm. The results show that (1) the optimisation processes of the imputation values corresponding to the SRM and ERM are convergent so that the optimised imputation values can be obtained; (2) SRM can ensure distribution consistency of the imputation values that are preliminarily optimised in the filling stage, while ERM can optimise the imputation values sophisticatedly in the refining stage, which is more helpful for classifier training; and (3) the RM-MVI algorithm can yield considerably better MVI performance on benchmark datasets than 11 well-known MVI algorithms, such as a 26% higher distribution consistency ratio and 2% to 5% higher testing accuracies for 6 classifiers on average. This demonstrates that RM-MVI is a viable approach for addressing MVI problems.</p></div>","PeriodicalId":7,"journal":{"name":"ACS Applied Polymer Materials","volume":null,"pages":null},"PeriodicalIF":4.4000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Polymer Materials","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705124010694","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Missing value imputation (MVI) is a key task in data science, in which learning models are built from incomplete data. In contrast to externally driven MVI algorithms, this study proposes a novel risk minimisation-based MVI algorithm (RM-MVI) that considers both the internal characteristics of missing data and the external performance for specific classification applications. RM-MVI is technically designed for labelled data and is applied in two stages: filling with structural risk minimisation (SRM) and refining with empirical risk minimisation (ERM). In the filling stage, an autoencoder with a single hidden layer is trained on the original dataset without missing values. Missing values are first initialised with random numbers, and the imputation values are then preliminarily optimised based on the derived updating rule to minimise the structural risk-oriented objective function. After the imputation values have been preliminarily optimised in the filling stage, a neural-network-based classifier is trained in the refining stage to optimise the imputation values sophisticatedly by reducing the empirical risk. Experiments were conducted on several benchmark datasets to validate the feasibility, rationality, and effectiveness of the proposed RM-MVI algorithm. The results show that (1) the optimisation processes of the imputation values corresponding to the SRM and ERM are convergent so that the optimised imputation values can be obtained; (2) SRM can ensure distribution consistency of the imputation values that are preliminarily optimised in the filling stage, while ERM can optimise the imputation values sophisticatedly in the refining stage, which is more helpful for classifier training; and (3) the RM-MVI algorithm can yield considerably better MVI performance on benchmark datasets than 11 well-known MVI algorithms, such as a 26% higher distribution consistency ratio and 2% to 5% higher testing accuracies for 6 classifiers on average. This demonstrates that RM-MVI is a viable approach for addressing MVI problems.

基于风险最小化的新型高效缺失值估算算法
缺失值估算(MVI)是数据科学中的一项关键任务,在这项任务中,要根据不完整的数据建立学习模型。与外部驱动的 MVI 算法不同,本研究提出了一种新颖的基于风险最小化的 MVI 算法(RM-MVI),它既考虑了缺失数据的内部特征,又考虑了特定分类应用的外部性能。RM-MVI 在技术上是为标记数据设计的,并分两个阶段应用:利用结构风险最小化(SRM)进行填充,以及利用经验风险最小化(ERM)进行细化。在填充阶段,在无缺失值的原始数据集上训练具有单隐层的自动编码器。首先用随机数对缺失值进行初始化,然后根据推导出的更新规则对估算值进行初步优化,以最小化以结构风险为导向的目标函数。在填充阶段对估算值进行初步优化后,在细化阶段对基于神经网络的分类器进行训练,通过降低经验风险对估算值进行精密优化。我们在多个基准数据集上进行了实验,以验证所提出的 RM-MVI 算法的可行性、合理性和有效性。结果表明:(1) SRM 和 ERM 对应的估算值优化过程是收敛的,因此可以得到优化的估算值;(2) SRM 可以确保在填充阶段初步优化的估算值的分布一致性,而 ERM 可以在细化阶段对估算值进行精细优化,这更有助于分类器的训练;(3) RM-MVI 算法在基准数据集上的 MVI 性能大大优于 11 种著名的 MVI 算法,如分布一致性比高 26%,6 个分类器的测试精度平均高 2%至 5%。这表明 RM-MVI 是解决 MVI 问题的一种可行方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.20
自引率
6.00%
发文量
810
期刊介绍: ACS Applied Polymer Materials is an interdisciplinary journal publishing original research covering all aspects of engineering, chemistry, physics, and biology relevant to applications of polymers. The journal is devoted to reports of new and original experimental and theoretical research of an applied nature that integrates fundamental knowledge in the areas of materials, engineering, physics, bioscience, polymer science and chemistry into important polymer applications. The journal is specifically interested in work that addresses relationships among structure, processing, morphology, chemistry, properties, and function as well as work that provide insights into mechanisms critical to the performance of the polymer for applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信