Yabo Shi , Peipei Li , Xiulan Yuan , You Wu , Haiping Wang
{"title":"缺失特征的多源多标签特征选择","authors":"Yabo Shi , Peipei Li , Xiulan Yuan , You Wu , Haiping Wang","doi":"10.1016/j.eswa.2025.129879","DOIUrl":null,"url":null,"abstract":"<div><div>Feature dimensionality reduction on Multi-Source Multi-Label (MSML) data is a critical and challenging task. Because practical situations always produce massive MSML data, but they usually contain more missing feature values in the high-dimensional feature space and present severely skewed label distributions in the multi-label space, which aggravate the difficulties in the tackling of high-dimensional feature selection on MSML data. However, much attention in feature selection has been directed either toward multi-label data or multi-source data, while little attention is focused on MSML data, not to mention those containing missing features. Motivated by this, we present a new feature selection method for MSML data with missing features, called MMFSMF. Specifically, to overcome the issue of feature missing, we first supplement the feature matrix by constructing a feature correlation matrix during the modeling process of multi-label learning. At the meanwhile, we utilize a multi-label oversampling mechanism to address the persistent problem of label skewness in multi-label data. Secondly, in terms of the above processing, we introduce a refined infinite feature selection algorithm to perform feature dimensionality reduction in each multi-label data source, considering both label correlations and label-specific features. Thirdly, to address feature redundancy among multiple data sources, we apply a new inter-source feature fusion method. Finally, experiments conducted on nine synthetic MSML datasets with missing features demonstrate that MMFSMF achieves superior performances compared to all competing ones.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"298 ","pages":"Article 129879"},"PeriodicalIF":7.5000,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-source multi-label feature selection with missing features\",\"authors\":\"Yabo Shi , Peipei Li , Xiulan Yuan , You Wu , Haiping Wang\",\"doi\":\"10.1016/j.eswa.2025.129879\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Feature dimensionality reduction on Multi-Source Multi-Label (MSML) data is a critical and challenging task. Because practical situations always produce massive MSML data, but they usually contain more missing feature values in the high-dimensional feature space and present severely skewed label distributions in the multi-label space, which aggravate the difficulties in the tackling of high-dimensional feature selection on MSML data. However, much attention in feature selection has been directed either toward multi-label data or multi-source data, while little attention is focused on MSML data, not to mention those containing missing features. Motivated by this, we present a new feature selection method for MSML data with missing features, called MMFSMF. Specifically, to overcome the issue of feature missing, we first supplement the feature matrix by constructing a feature correlation matrix during the modeling process of multi-label learning. At the meanwhile, we utilize a multi-label oversampling mechanism to address the persistent problem of label skewness in multi-label data. Secondly, in terms of the above processing, we introduce a refined infinite feature selection algorithm to perform feature dimensionality reduction in each multi-label data source, considering both label correlations and label-specific features. Thirdly, to address feature redundancy among multiple data sources, we apply a new inter-source feature fusion method. Finally, experiments conducted on nine synthetic MSML datasets with missing features demonstrate that MMFSMF achieves superior performances compared to all competing ones.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"298 \",\"pages\":\"Article 129879\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-10-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417425034943\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425034943","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Multi-source multi-label feature selection with missing features
Feature dimensionality reduction on Multi-Source Multi-Label (MSML) data is a critical and challenging task. Because practical situations always produce massive MSML data, but they usually contain more missing feature values in the high-dimensional feature space and present severely skewed label distributions in the multi-label space, which aggravate the difficulties in the tackling of high-dimensional feature selection on MSML data. However, much attention in feature selection has been directed either toward multi-label data or multi-source data, while little attention is focused on MSML data, not to mention those containing missing features. Motivated by this, we present a new feature selection method for MSML data with missing features, called MMFSMF. Specifically, to overcome the issue of feature missing, we first supplement the feature matrix by constructing a feature correlation matrix during the modeling process of multi-label learning. At the meanwhile, we utilize a multi-label oversampling mechanism to address the persistent problem of label skewness in multi-label data. Secondly, in terms of the above processing, we introduce a refined infinite feature selection algorithm to perform feature dimensionality reduction in each multi-label data source, considering both label correlations and label-specific features. Thirdly, to address feature redundancy among multiple data sources, we apply a new inter-source feature fusion method. Finally, experiments conducted on nine synthetic MSML datasets with missing features demonstrate that MMFSMF achieves superior performances compared to all competing ones.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.