Towards Clustering of Incomplete Mixed-Attribute Data

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems Pub Date : 2025-05-14 DOI:10.1111/exsy.70074

Chuyao Zhang, Xinxi Chen, Zexi Tan, Fangqing Gu, Yuzhu Ji, Yiqun Zhang

{"title":"Towards Clustering of Incomplete Mixed-Attribute Data","authors":"Chuyao Zhang, Xinxi Chen, Zexi Tan, Fangqing Gu, Yuzhu Ji, Yiqun Zhang","doi":"10.1111/exsy.70074","DOIUrl":null,"url":null,"abstract":"<p>Clustering analysis is one of the most important data mining and knowledge discovery tools in real applications. Since the widespread presence of missing values hampers clustering performance, missing values imputation becomes necessary for data pre-processing. However, for the common datasets composed of both numerical and categorical attributes (also known as mixed-attribute datasets), most existing imputation methods suffer from the following three limitations: (1) Only feasible for a certain type of attribute; (2) Encounter difficulties in considering the interdependence between different types of attributes; (3) Short in exploiting the information provided by the incomplete mix-valued objects. As a result, the original data distribution can be ill-restored, misleading the downstream clustering tasks. This paper therefore proposes a clustering-imputation co-learning method for incomplete mixed-attribute datasets to address these issues. This method integrates imputation and clustering into one learning process, emphasising the interrelationships between mixed attributes during the imputation process and exploiting the information of incomplete objectsduring clustering. It turns out that appropriate recovery of the dataset and accurate clustering can be better achieved through a cross-coupling manner. Experiments on various datasets validate the promising efficacy of the proposed method.</p>","PeriodicalId":51053,"journal":{"name":"Expert Systems","volume":"42 7","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/exsy.70074","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/exsy.70074","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Clustering analysis is one of the most important data mining and knowledge discovery tools in real applications. Since the widespread presence of missing values hampers clustering performance, missing values imputation becomes necessary for data pre-processing. However, for the common datasets composed of both numerical and categorical attributes (also known as mixed-attribute datasets), most existing imputation methods suffer from the following three limitations: (1) Only feasible for a certain type of attribute; (2) Encounter difficulties in considering the interdependence between different types of attributes; (3) Short in exploiting the information provided by the incomplete mix-valued objects. As a result, the original data distribution can be ill-restored, misleading the downstream clustering tasks. This paper therefore proposes a clustering-imputation co-learning method for incomplete mixed-attribute datasets to address these issues. This method integrates imputation and clustering into one learning process, emphasising the interrelationships between mixed attributes during the imputation process and exploiting the information of incomplete objectsduring clustering. It turns out that appropriate recovery of the dataset and accurate clustering can be better achieved through a cross-coupling manner. Experiments on various datasets validate the promising efficacy of the proposed method.

查看原文本刊更多论文

不完全混合属性数据的聚类研究

聚类分析是实际应用中最重要的数据挖掘和知识发现工具之一。由于缺失值的广泛存在阻碍了聚类性能，缺失值的输入成为数据预处理的必要条件。然而，对于由数值属性和分类属性组成的常见数据集（也称为混合属性数据集），现有的大多数归算方法存在以下三个局限性：(1)只适用于某一类属性；(2)难以考虑不同类型属性之间的相互依赖关系；(3)利用不完全混合值对象提供的信息不足。因此，原始数据分布可能无法恢复，从而误导下游聚类任务。因此，本文提出了一种针对不完全混合属性数据集的聚类-插值共同学习方法来解决这些问题。该方法将拟合和聚类集成到一个学习过程中，在拟合过程中强调混合属性之间的相互关系，在聚类过程中利用不完整对象的信息。结果表明，通过交叉耦合的方式可以更好地实现数据集的适当恢复和准确聚类。在不同数据集上的实验验证了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Expert Systems 工程技术-计算机：理论方法

CiteScore

7.40

自引率

6.10%

发文量

266

审稿时长

24 months

期刊介绍： Expert Systems: The Journal of Knowledge Engineering publishes papers dealing with all aspects of knowledge engineering, including individual methods and techniques in knowledge acquisition and representation, and their application in the construction of systems – including expert systems – based thereon. Detailed scientific evaluation is an essential part of any paper. As well as traditional application areas, such as Software and Requirements Engineering, Human-Computer Interaction, and Artificial Intelligence, we are aiming at the new and growing markets for these technologies, such as Business, Economy, Market Research, and Medical and Health Care. The shift towards this new focus will be marked by a series of special issues covering hot and emergent topics.