Jingya Dong , Shuai Tao , Chunhe Song , Peiming Ning , Tao Zhang
{"title":"Density cluster-based feature selection: An information theory approach","authors":"Jingya Dong , Shuai Tao , Chunhe Song , Peiming Ning , Tao Zhang","doi":"10.1016/j.engappai.2025.111694","DOIUrl":null,"url":null,"abstract":"<div><div>Feature selection plays a crucial role in data mining and machine learning. However, evident challenges exist: (1) current methods cannot autonomously identify the optimal feature set, requiring manual parameter adjustment based on the learning algorithm; (2) heuristic methods, which are widely used, often struggle to ensure the maximization of the objective function. To address these challenges, this paper proposes a density cluster-based feature selection (DCFS) method leveraging information theory, which involves the application of artificial intelligence (AI) in the clustering process. First, a novel initial feature selection method that maximizes feature-relevance and feature-difference is introduced to automate the selection of an initial feature subset. Second, a new density-centric automatic clustering (DAC) algorithm, an AI-based clustering approach, is proposed. This algorithm synthesizes non-parametric density estimation, decision graph-based density center selection strategies, and adaptive domain search clustering methods to enhance the precision and robustness of clustering outcomes. Third, a feature space selection method based on maximizing feature relevance is established to construct a comprehensive feature subspace. This feature space selection method converts the maximization of the objective function into an automated density clustering process, facilitating the automatic selection of the most optimal features. Extensive experiments conducted across 14 datasets have demonstrated the superior performance of the proposed DCFS in terms of effectiveness and robustness. To the best of our knowledge, this paper is the first work attempting automatic feature selection through clustering, thus pushing the frontier of feature selection algorithm development.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"159 ","pages":"Article 111694"},"PeriodicalIF":7.5000,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625016963","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Feature selection plays a crucial role in data mining and machine learning. However, evident challenges exist: (1) current methods cannot autonomously identify the optimal feature set, requiring manual parameter adjustment based on the learning algorithm; (2) heuristic methods, which are widely used, often struggle to ensure the maximization of the objective function. To address these challenges, this paper proposes a density cluster-based feature selection (DCFS) method leveraging information theory, which involves the application of artificial intelligence (AI) in the clustering process. First, a novel initial feature selection method that maximizes feature-relevance and feature-difference is introduced to automate the selection of an initial feature subset. Second, a new density-centric automatic clustering (DAC) algorithm, an AI-based clustering approach, is proposed. This algorithm synthesizes non-parametric density estimation, decision graph-based density center selection strategies, and adaptive domain search clustering methods to enhance the precision and robustness of clustering outcomes. Third, a feature space selection method based on maximizing feature relevance is established to construct a comprehensive feature subspace. This feature space selection method converts the maximization of the objective function into an automated density clustering process, facilitating the automatic selection of the most optimal features. Extensive experiments conducted across 14 datasets have demonstrated the superior performance of the proposed DCFS in terms of effectiveness and robustness. To the best of our knowledge, this paper is the first work attempting automatic feature selection through clustering, thus pushing the frontier of feature selection algorithm development.
期刊介绍:
Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.