{"title":"基于knn的多标签粗糙集理论的不平衡数据多标签特征选择","authors":"Weihua Xu, Yuzhe Li","doi":"10.1016/j.ins.2025.122220","DOIUrl":null,"url":null,"abstract":"<div><div>In the realm of multi-label feature selection, the intricacy of data structures and semantics has been escalating, rendering traditional single-label feature selection methodologies inadequate for contemporary demands to meet contemporary demands. This manuscript introduces an innovative neighborhood rough set model that integrates <em>δ</em>-neighborhood rough sets with <em>k</em>-nearest neighbor techniques, facilitating a transition from single-label to multi-label learning frameworks. The study delves into the attribute dependency concept within rough set theory and proposes a novel importance function based thereon, which can effectively quantify the significance of features within multi-label decision-making contexts. Building on this theoretical foundation, we have crafted a feature selection algorithm specifically tailored for imbalanced datasets. Extensive experiments conducted on 12 datasets, coupled with comparative analyses with 10 cutting-edge methods, have substantiated the superior performance of our algorithm in managing imbalanced datasets. This research not only offers a fresh theoretical perspective but also has significant practical implications, particularly in scenarios involving imbalanced datasets with multiple labels.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"715 ","pages":"Article 122220"},"PeriodicalIF":8.1000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-label feature selection for imbalanced data via KNN-based multi-label rough set theory\",\"authors\":\"Weihua Xu, Yuzhe Li\",\"doi\":\"10.1016/j.ins.2025.122220\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In the realm of multi-label feature selection, the intricacy of data structures and semantics has been escalating, rendering traditional single-label feature selection methodologies inadequate for contemporary demands to meet contemporary demands. This manuscript introduces an innovative neighborhood rough set model that integrates <em>δ</em>-neighborhood rough sets with <em>k</em>-nearest neighbor techniques, facilitating a transition from single-label to multi-label learning frameworks. The study delves into the attribute dependency concept within rough set theory and proposes a novel importance function based thereon, which can effectively quantify the significance of features within multi-label decision-making contexts. Building on this theoretical foundation, we have crafted a feature selection algorithm specifically tailored for imbalanced datasets. Extensive experiments conducted on 12 datasets, coupled with comparative analyses with 10 cutting-edge methods, have substantiated the superior performance of our algorithm in managing imbalanced datasets. This research not only offers a fresh theoretical perspective but also has significant practical implications, particularly in scenarios involving imbalanced datasets with multiple labels.</div></div>\",\"PeriodicalId\":51063,\"journal\":{\"name\":\"Information Sciences\",\"volume\":\"715 \",\"pages\":\"Article 122220\"},\"PeriodicalIF\":8.1000,\"publicationDate\":\"2025-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Sciences\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0020025525003524\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525003524","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Multi-label feature selection for imbalanced data via KNN-based multi-label rough set theory
In the realm of multi-label feature selection, the intricacy of data structures and semantics has been escalating, rendering traditional single-label feature selection methodologies inadequate for contemporary demands to meet contemporary demands. This manuscript introduces an innovative neighborhood rough set model that integrates δ-neighborhood rough sets with k-nearest neighbor techniques, facilitating a transition from single-label to multi-label learning frameworks. The study delves into the attribute dependency concept within rough set theory and proposes a novel importance function based thereon, which can effectively quantify the significance of features within multi-label decision-making contexts. Building on this theoretical foundation, we have crafted a feature selection algorithm specifically tailored for imbalanced datasets. Extensive experiments conducted on 12 datasets, coupled with comparative analyses with 10 cutting-edge methods, have substantiated the superior performance of our algorithm in managing imbalanced datasets. This research not only offers a fresh theoretical perspective but also has significant practical implications, particularly in scenarios involving imbalanced datasets with multiple labels.
期刊介绍:
Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions.
Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.