Incomplete data classification via positive approximation based rough subspaces ensemble

IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Yuanting Yan , Meili Yang , Zhong Zheng , Hao Ge , Yiwen Zhang , Yanping Zhang
{"title":"Incomplete data classification via positive approximation based rough subspaces ensemble","authors":"Yuanting Yan ,&nbsp;Meili Yang ,&nbsp;Zhong Zheng ,&nbsp;Hao Ge ,&nbsp;Yiwen Zhang ,&nbsp;Yanping Zhang","doi":"10.1016/j.bdr.2024.100496","DOIUrl":null,"url":null,"abstract":"<div><div>Classifying incomplete data using ensemble techniques is a prevalent method for addressing missing values, where multiple classifiers are trained on diverse subsets of features. However, current ensemble-based methods overlook the redundancy within feature subsets, presenting challenges for training robust prediction models, because the redundant features can hinder the learning of the underlying rules in the data. In this paper, we propose a Reduct-Missing Pattern Fusion (RMPF) method to address the aforementioned limitation. It leverages both the advantages of rough set theory and the effectiveness of missing patterns in classifying incomplete data. RMPF employs a heuristic algorithm to generate a set of positive approximation-based attribute reducts. Subsequently, it integrates the missing patterns with these reducts through a fusion strategy to minimize data redundancy. Finally, the optimized subsets are utilized to train a group of base classifiers, and a selective prediction procedure is applied to produce the ensembled prediction results. Experimental results show that our method is superior to the compared state-of-the-art methods in both performance and robustness. Especially, our method obtains significant superiority in the scenarios of data with high missing rates.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"38 ","pages":"Article 100496"},"PeriodicalIF":3.5000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data Research","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214579624000716","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Classifying incomplete data using ensemble techniques is a prevalent method for addressing missing values, where multiple classifiers are trained on diverse subsets of features. However, current ensemble-based methods overlook the redundancy within feature subsets, presenting challenges for training robust prediction models, because the redundant features can hinder the learning of the underlying rules in the data. In this paper, we propose a Reduct-Missing Pattern Fusion (RMPF) method to address the aforementioned limitation. It leverages both the advantages of rough set theory and the effectiveness of missing patterns in classifying incomplete data. RMPF employs a heuristic algorithm to generate a set of positive approximation-based attribute reducts. Subsequently, it integrates the missing patterns with these reducts through a fusion strategy to minimize data redundancy. Finally, the optimized subsets are utilized to train a group of base classifiers, and a selective prediction procedure is applied to produce the ensembled prediction results. Experimental results show that our method is superior to the compared state-of-the-art methods in both performance and robustness. Especially, our method obtains significant superiority in the scenarios of data with high missing rates.
通过基于正逼近的粗糙子空间集合进行不完整数据分类
使用集合技术对不完整数据进行分类是解决缺失值问题的一种普遍方法,在这种方法中,多个分类器都是根据不同的特征子集进行训练的。然而,目前基于集合的方法忽视了特征子集中的冗余性,给训练稳健的预测模型带来了挑战,因为冗余特征会阻碍数据中潜在规则的学习。在本文中,我们提出了一种减少缺失模式融合(Reduct-Missing Pattern Fusion,RMPF)方法来解决上述局限性。它充分利用了粗糙集理论的优势和缺失模式在不完整数据分类中的有效性。RMPF 采用启发式算法生成一组基于正近似的属性还原。随后,它通过融合策略将缺失模式与这些还原整合在一起,以尽量减少数据冗余。最后,利用优化后的子集来训练一组基础分类器,并采用选择性预测程序来生成集合预测结果。实验结果表明,我们的方法在性能和鲁棒性方面都优于同类最先进的方法。特别是在数据缺失率较高的情况下,我们的方法取得了显著的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Big Data Research
Big Data Research Computer Science-Computer Science Applications
CiteScore
8.40
自引率
3.00%
发文量
0
期刊介绍: The journal aims to promote and communicate advances in big data research by providing a fast and high quality forum for researchers, practitioners and policy makers from the very many different communities working on, and with, this topic. The journal will accept papers on foundational aspects in dealing with big data, as well as papers on specific Platforms and Technologies used to deal with big data. To promote Data Science and interdisciplinary collaboration between fields, and to showcase the benefits of data driven research, papers demonstrating applications of big data in domains as diverse as Geoscience, Social Web, Finance, e-Commerce, Health Care, Environment and Climate, Physics and Astronomy, Chemistry, life sciences and drug discovery, digital libraries and scientific publications, security and government will also be considered. Occasionally the journal may publish whitepapers on policies, standards and best practices.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信