{"title":"Feature selection based on rough diversity entropy","authors":"Xiongtao Zou, Jianhua Dai","doi":"10.1016/j.patcog.2025.112032","DOIUrl":null,"url":null,"abstract":"<div><div>Information entropy, as a powerful tool for measuring the uncertainty of information, is widely used in many fields such as communication, data compression, data mining and bioinformatics. However, the classical information entropy has two shortcomings, that is, information entropy cannot accurately measure the uncertainty of knowledge in some cases and the joint probability in information entropy is usually difficult to calculate for high-dimensional data. Additionally, uncertainty measure is the foundation of feature selection in granular computing. Inaccurate measures may lead to poor performance of feature selection methods. To address these issues, we propose a novel uncertainty measure called rough diversity entropy based on rough set theory. Rough diversity entropy can more accurately measure the uncertainty of knowledge compared with the classical information entropy. In this article, rough diversity entropy and its variants are first defined, and their related properties are studied. Next, a heuristic feature selection method based on the defined measures is put forward, and the corresponding algorithm is also designed. Finally, a series of experiments are executed to validate the effectiveness and rationality of the proposed method. The analysis results show that our proposed method has good performance compared with eight existing feature selection methods. Moreover, the proposed method improves the average accuracy of 15 datasets by 6.53% under four classifiers, and achieves an average feature reduction rate of up to 99.81%. We believe that the proposed method is an effective feature selection approach for classification learning.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112032"},"PeriodicalIF":7.5000,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325006922","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Information entropy, as a powerful tool for measuring the uncertainty of information, is widely used in many fields such as communication, data compression, data mining and bioinformatics. However, the classical information entropy has two shortcomings, that is, information entropy cannot accurately measure the uncertainty of knowledge in some cases and the joint probability in information entropy is usually difficult to calculate for high-dimensional data. Additionally, uncertainty measure is the foundation of feature selection in granular computing. Inaccurate measures may lead to poor performance of feature selection methods. To address these issues, we propose a novel uncertainty measure called rough diversity entropy based on rough set theory. Rough diversity entropy can more accurately measure the uncertainty of knowledge compared with the classical information entropy. In this article, rough diversity entropy and its variants are first defined, and their related properties are studied. Next, a heuristic feature selection method based on the defined measures is put forward, and the corresponding algorithm is also designed. Finally, a series of experiments are executed to validate the effectiveness and rationality of the proposed method. The analysis results show that our proposed method has good performance compared with eight existing feature selection methods. Moreover, the proposed method improves the average accuracy of 15 datasets by 6.53% under four classifiers, and achieves an average feature reduction rate of up to 99.81%. We believe that the proposed method is an effective feature selection approach for classification learning.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.