{"title":"Towards Clustering of Incomplete Mixed-Attribute Data","authors":"Chuyao Zhang, Xinxi Chen, Zexi Tan, Fangqing Gu, Yuzhu Ji, Yiqun Zhang","doi":"10.1111/exsy.70074","DOIUrl":null,"url":null,"abstract":"<p>Clustering analysis is one of the most important data mining and knowledge discovery tools in real applications. Since the widespread presence of missing values hampers clustering performance, missing values imputation becomes necessary for data pre-processing. However, for the common datasets composed of both numerical and categorical attributes (also known as mixed-attribute datasets), most existing imputation methods suffer from the following three limitations: (1) Only feasible for a certain type of attribute; (2) Encounter difficulties in considering the interdependence between different types of attributes; (3) Short in exploiting the information provided by the incomplete mix-valued objects. As a result, the original data distribution can be ill-restored, misleading the downstream clustering tasks. This paper therefore proposes a clustering-imputation co-learning method for incomplete mixed-attribute datasets to address these issues. This method integrates imputation and clustering into one learning process, emphasising the interrelationships between mixed attributes during the imputation process and exploiting the information of incomplete objectsduring clustering. It turns out that appropriate recovery of the dataset and accurate clustering can be better achieved through a cross-coupling manner. Experiments on various datasets validate the promising efficacy of the proposed method.</p>","PeriodicalId":51053,"journal":{"name":"Expert Systems","volume":"42 7","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/exsy.70074","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/exsy.70074","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Clustering analysis is one of the most important data mining and knowledge discovery tools in real applications. Since the widespread presence of missing values hampers clustering performance, missing values imputation becomes necessary for data pre-processing. However, for the common datasets composed of both numerical and categorical attributes (also known as mixed-attribute datasets), most existing imputation methods suffer from the following three limitations: (1) Only feasible for a certain type of attribute; (2) Encounter difficulties in considering the interdependence between different types of attributes; (3) Short in exploiting the information provided by the incomplete mix-valued objects. As a result, the original data distribution can be ill-restored, misleading the downstream clustering tasks. This paper therefore proposes a clustering-imputation co-learning method for incomplete mixed-attribute datasets to address these issues. This method integrates imputation and clustering into one learning process, emphasising the interrelationships between mixed attributes during the imputation process and exploiting the information of incomplete objectsduring clustering. It turns out that appropriate recovery of the dataset and accurate clustering can be better achieved through a cross-coupling manner. Experiments on various datasets validate the promising efficacy of the proposed method.
期刊介绍:
Expert Systems: The Journal of Knowledge Engineering publishes papers dealing with all aspects of knowledge engineering, including individual methods and techniques in knowledge acquisition and representation, and their application in the construction of systems – including expert systems – based thereon. Detailed scientific evaluation is an essential part of any paper.
As well as traditional application areas, such as Software and Requirements Engineering, Human-Computer Interaction, and Artificial Intelligence, we are aiming at the new and growing markets for these technologies, such as Business, Economy, Market Research, and Medical and Health Care. The shift towards this new focus will be marked by a series of special issues covering hot and emergent topics.