Feature selection based on rough diversity entropy

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2025-07-03 DOI:10.1016/j.patcog.2025.112032

Xiongtao Zou, Jianhua Dai

{"title":"Feature selection based on rough diversity entropy","authors":"Xiongtao Zou, Jianhua Dai","doi":"10.1016/j.patcog.2025.112032","DOIUrl":null,"url":null,"abstract":"<div><div>Information entropy, as a powerful tool for measuring the uncertainty of information, is widely used in many fields such as communication, data compression, data mining and bioinformatics. However, the classical information entropy has two shortcomings, that is, information entropy cannot accurately measure the uncertainty of knowledge in some cases and the joint probability in information entropy is usually difficult to calculate for high-dimensional data. Additionally, uncertainty measure is the foundation of feature selection in granular computing. Inaccurate measures may lead to poor performance of feature selection methods. To address these issues, we propose a novel uncertainty measure called rough diversity entropy based on rough set theory. Rough diversity entropy can more accurately measure the uncertainty of knowledge compared with the classical information entropy. In this article, rough diversity entropy and its variants are first defined, and their related properties are studied. Next, a heuristic feature selection method based on the defined measures is put forward, and the corresponding algorithm is also designed. Finally, a series of experiments are executed to validate the effectiveness and rationality of the proposed method. The analysis results show that our proposed method has good performance compared with eight existing feature selection methods. Moreover, the proposed method improves the average accuracy of 15 datasets by 6.53% under four classifiers, and achieves an average feature reduction rate of up to 99.81%. We believe that the proposed method is an effective feature selection approach for classification learning.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112032"},"PeriodicalIF":7.5000,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325006922","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Information entropy, as a powerful tool for measuring the uncertainty of information, is widely used in many fields such as communication, data compression, data mining and bioinformatics. However, the classical information entropy has two shortcomings, that is, information entropy cannot accurately measure the uncertainty of knowledge in some cases and the joint probability in information entropy is usually difficult to calculate for high-dimensional data. Additionally, uncertainty measure is the foundation of feature selection in granular computing. Inaccurate measures may lead to poor performance of feature selection methods. To address these issues, we propose a novel uncertainty measure called rough diversity entropy based on rough set theory. Rough diversity entropy can more accurately measure the uncertainty of knowledge compared with the classical information entropy. In this article, rough diversity entropy and its variants are first defined, and their related properties are studied. Next, a heuristic feature selection method based on the defined measures is put forward, and the corresponding algorithm is also designed. Finally, a series of experiments are executed to validate the effectiveness and rationality of the proposed method. The analysis results show that our proposed method has good performance compared with eight existing feature selection methods. Moreover, the proposed method improves the average accuracy of 15 datasets by 6.53% under four classifiers, and achieves an average feature reduction rate of up to 99.81%. We believe that the proposed method is an effective feature selection approach for classification learning.

查看原文本刊更多论文

基于粗糙多样性熵的特征选择

信息熵作为衡量信息不确定性的有力工具，被广泛应用于通信、数据压缩、数据挖掘和生物信息学等领域。然而，经典信息熵存在两个缺点，即信息熵在某些情况下不能准确度量知识的不确定性，信息熵中的联合概率对于高维数据通常难以计算。此外，不确定性测度是颗粒计算中特征选择的基础。不准确的度量可能导致特征选择方法的性能下降。为了解决这些问题，我们提出了一种新的基于粗糙集理论的不确定性度量方法——粗糙多样性熵。与经典信息熵相比，粗糙多样性熵能更准确地度量知识的不确定性。本文首先定义了粗糙多样性熵及其变体，并研究了它们的相关性质。其次，提出了一种基于定义测度的启发式特征选择方法，并设计了相应的算法。最后，通过一系列实验验证了所提方法的有效性和合理性。分析结果表明，与已有的8种特征选择方法相比，本文提出的方法具有较好的性能。此外，该方法在4种分类器下将15个数据集的平均准确率提高了6.53%，平均特征约简率高达99.81%。我们认为该方法是一种有效的分类学习特征选择方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.