Multi-source multi-label feature selection with missing features

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2025-10-04 DOI:10.1016/j.eswa.2025.129879

Yabo Shi , Peipei Li , Xiulan Yuan , You Wu , Haiping Wang

{"title":"Multi-source multi-label feature selection with missing features","authors":"Yabo Shi , Peipei Li , Xiulan Yuan , You Wu , Haiping Wang","doi":"10.1016/j.eswa.2025.129879","DOIUrl":null,"url":null,"abstract":"<div><div>Feature dimensionality reduction on Multi-Source Multi-Label (MSML) data is a critical and challenging task. Because practical situations always produce massive MSML data, but they usually contain more missing feature values in the high-dimensional feature space and present severely skewed label distributions in the multi-label space, which aggravate the difficulties in the tackling of high-dimensional feature selection on MSML data. However, much attention in feature selection has been directed either toward multi-label data or multi-source data, while little attention is focused on MSML data, not to mention those containing missing features. Motivated by this, we present a new feature selection method for MSML data with missing features, called MMFSMF. Specifically, to overcome the issue of feature missing, we first supplement the feature matrix by constructing a feature correlation matrix during the modeling process of multi-label learning. At the meanwhile, we utilize a multi-label oversampling mechanism to address the persistent problem of label skewness in multi-label data. Secondly, in terms of the above processing, we introduce a refined infinite feature selection algorithm to perform feature dimensionality reduction in each multi-label data source, considering both label correlations and label-specific features. Thirdly, to address feature redundancy among multiple data sources, we apply a new inter-source feature fusion method. Finally, experiments conducted on nine synthetic MSML datasets with missing features demonstrate that MMFSMF achieves superior performances compared to all competing ones.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"298 ","pages":"Article 129879"},"PeriodicalIF":7.5000,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425034943","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Feature dimensionality reduction on Multi-Source Multi-Label (MSML) data is a critical and challenging task. Because practical situations always produce massive MSML data, but they usually contain more missing feature values in the high-dimensional feature space and present severely skewed label distributions in the multi-label space, which aggravate the difficulties in the tackling of high-dimensional feature selection on MSML data. However, much attention in feature selection has been directed either toward multi-label data or multi-source data, while little attention is focused on MSML data, not to mention those containing missing features. Motivated by this, we present a new feature selection method for MSML data with missing features, called MMFSMF. Specifically, to overcome the issue of feature missing, we first supplement the feature matrix by constructing a feature correlation matrix during the modeling process of multi-label learning. At the meanwhile, we utilize a multi-label oversampling mechanism to address the persistent problem of label skewness in multi-label data. Secondly, in terms of the above processing, we introduce a refined infinite feature selection algorithm to perform feature dimensionality reduction in each multi-label data source, considering both label correlations and label-specific features. Thirdly, to address feature redundancy among multiple data sources, we apply a new inter-source feature fusion method. Finally, experiments conducted on nine synthetic MSML datasets with missing features demonstrate that MMFSMF achieves superior performances compared to all competing ones.

查看原文本刊更多论文

缺失特征的多源多标签特征选择

多源多标签（MSML）数据的特征降维是一项关键且具有挑战性的任务。由于实际情况下总会产生大量的MSML数据，但这些数据往往在高维特征空间中包含更多的缺失特征值，在多标签空间中呈现严重的标签分布偏斜，这加剧了MSML数据高维特征选择的解决困难。然而，在特征选择方面，人们的注意力大多集中在多标签数据或多源数据上，而对MSML数据的关注很少，更不用说那些包含缺失特征的数据了。基于此，我们提出了一种新的特征缺失的MSML数据特征选择方法，称为MMFSMF。具体来说，为了克服特征缺失的问题，我们首先在多标签学习的建模过程中通过构造特征相关矩阵来补充特征矩阵。同时，我们利用多标签过采样机制来解决多标签数据中持续存在的标签偏度问题。其次，在上述处理方面，我们引入了一种改进的无限特征选择算法，在考虑标签相关性和标签特定特征的情况下，对每个多标签数据源进行特征降维。第三，针对多数据源之间的特征冗余问题，提出了一种新的源间特征融合方法。最后，在9个缺失特征的合成MSML数据集上进行的实验表明，MMFSMF的性能优于所有竞争数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.