Density cluster-based feature selection: An information theory approach

IF 7.5 2区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS
Jingya Dong , Shuai Tao , Chunhe Song , Peiming Ning , Tao Zhang
{"title":"Density cluster-based feature selection: An information theory approach","authors":"Jingya Dong ,&nbsp;Shuai Tao ,&nbsp;Chunhe Song ,&nbsp;Peiming Ning ,&nbsp;Tao Zhang","doi":"10.1016/j.engappai.2025.111694","DOIUrl":null,"url":null,"abstract":"<div><div>Feature selection plays a crucial role in data mining and machine learning. However, evident challenges exist: (1) current methods cannot autonomously identify the optimal feature set, requiring manual parameter adjustment based on the learning algorithm; (2) heuristic methods, which are widely used, often struggle to ensure the maximization of the objective function. To address these challenges, this paper proposes a density cluster-based feature selection (DCFS) method leveraging information theory, which involves the application of artificial intelligence (AI) in the clustering process. First, a novel initial feature selection method that maximizes feature-relevance and feature-difference is introduced to automate the selection of an initial feature subset. Second, a new density-centric automatic clustering (DAC) algorithm, an AI-based clustering approach, is proposed. This algorithm synthesizes non-parametric density estimation, decision graph-based density center selection strategies, and adaptive domain search clustering methods to enhance the precision and robustness of clustering outcomes. Third, a feature space selection method based on maximizing feature relevance is established to construct a comprehensive feature subspace. This feature space selection method converts the maximization of the objective function into an automated density clustering process, facilitating the automatic selection of the most optimal features. Extensive experiments conducted across 14 datasets have demonstrated the superior performance of the proposed DCFS in terms of effectiveness and robustness. To the best of our knowledge, this paper is the first work attempting automatic feature selection through clustering, thus pushing the frontier of feature selection algorithm development.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"159 ","pages":"Article 111694"},"PeriodicalIF":7.5000,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625016963","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Feature selection plays a crucial role in data mining and machine learning. However, evident challenges exist: (1) current methods cannot autonomously identify the optimal feature set, requiring manual parameter adjustment based on the learning algorithm; (2) heuristic methods, which are widely used, often struggle to ensure the maximization of the objective function. To address these challenges, this paper proposes a density cluster-based feature selection (DCFS) method leveraging information theory, which involves the application of artificial intelligence (AI) in the clustering process. First, a novel initial feature selection method that maximizes feature-relevance and feature-difference is introduced to automate the selection of an initial feature subset. Second, a new density-centric automatic clustering (DAC) algorithm, an AI-based clustering approach, is proposed. This algorithm synthesizes non-parametric density estimation, decision graph-based density center selection strategies, and adaptive domain search clustering methods to enhance the precision and robustness of clustering outcomes. Third, a feature space selection method based on maximizing feature relevance is established to construct a comprehensive feature subspace. This feature space selection method converts the maximization of the objective function into an automated density clustering process, facilitating the automatic selection of the most optimal features. Extensive experiments conducted across 14 datasets have demonstrated the superior performance of the proposed DCFS in terms of effectiveness and robustness. To the best of our knowledge, this paper is the first work attempting automatic feature selection through clustering, thus pushing the frontier of feature selection algorithm development.
基于密度聚类的特征选择:一种信息论方法
特征选择在数据挖掘和机器学习中起着至关重要的作用。然而,存在明显的挑战:(1)现有方法不能自主识别最优特征集,需要基于学习算法手动调整参数;(2)广泛使用的启发式方法往往难以保证目标函数的最大化。为了解决这些挑战,本文提出了一种利用信息理论的基于密度聚类的特征选择(DCFS)方法,该方法涉及在聚类过程中应用人工智能(AI)。首先,引入一种新的特征相关性和特征差异性最大化的初始特征选择方法,实现初始特征子集的自动选择;其次,提出了一种新的以密度为中心的自动聚类(DAC)算法,即基于人工智能的聚类方法。该算法综合了非参数密度估计、基于决策图的密度中心选择策略和自适应域搜索聚类方法,提高了聚类结果的精度和鲁棒性。第三,建立基于特征相关性最大化的特征空间选择方法,构建综合特征子空间。这种特征空间选择方法将目标函数的最大化转化为自动化的密度聚类过程,便于自动选择最优特征。在14个数据集上进行的广泛实验证明了所提出的DCFS在有效性和鲁棒性方面的优越性能。据我们所知,本文是第一个尝试通过聚类进行自动特征选择的工作,从而推动了特征选择算法发展的前沿。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Engineering Applications of Artificial Intelligence
Engineering Applications of Artificial Intelligence 工程技术-工程:电子与电气
CiteScore
9.60
自引率
10.00%
发文量
505
审稿时长
68 days
期刊介绍: Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信