Huilin Yang , Fusheng Yu , Witold Pedrycz , Life Fellow, IEEE, Zonglin Yang , Jiaqi Chang , Jiayin Wang
{"title":"An auto-weighted enhanced horizontal collaborative fuzzy clustering algorithm with knowledge adaption mechanism","authors":"Huilin Yang , Fusheng Yu , Witold Pedrycz , Life Fellow, IEEE, Zonglin Yang , Jiaqi Chang , Jiayin Wang","doi":"10.1016/j.ijar.2024.109169","DOIUrl":null,"url":null,"abstract":"<div><p>Among the multi-source data clustering tasks, there is a kind of frequently encountered tasks where only one of the multi-source datasets is available for sake of privacy and other reasons. The only available dataset is called local dataset, and the other are called external datasets. The horizontal collaborative fuzzy clustering (HCFC) model is a typical one that can deal with such clustering tasks. In HCFC, each external dataset is used through the knowledge mined from it rather than itself. The knowledge expressed as a knowledge partition matrix is fused into the clustering process of the local dataset. Reviewing the existing HCFC models, we can find three issues that need improvement. Firstly, the existing HCFC models quantify the collaboration contribution of each external knowledge by a hyperparameter at dataset-level, and moreover, do not distinguish the collaboration contributions of objects in the same external dataset. This may lead to counterintuitive clustering results. Focused on this issue, this paper proposes an enhanced HCFC (EHCFC) algorithm that extends the collaboration from dataset-level to object-level, and assigns different weights to objects based on the information amount provided by objects. Through EHCFC, a more flexible collaboration and a more intuitive clustering result can be reached. Secondly, the collaboration mechanisms of the existing HCFC models require that the dimensionalities of the partition matrices of external datasets and local dataset are the same, which makes the HCFC algorithms unable to work in many real situations. Focused on this limitation, a knowledge adaption mechanism based on relative entropy and spectral clustering is proposed resulting in a further refined EHCFC-KA algorithm, i.e., EHCFC with knowledge adaption. The proposed knowledge adaption mechanism makes both the HCFC algorithms and the EHCFC algorithm effective and successful in more application scenarios. Finally, we define two indexes in terms of consistency (the consistency of the clustering result with external knowledge) to evaluate the performance of collaborative clustering. Experiments on synthetic datasets and UCI public datasets demonstrate that the proposed EHCFC and EHCFC-KA algorithms outperform the existing HCFC algorithms and achieve significantly better intuitive collaborative clustering performance.</p></div>","PeriodicalId":13842,"journal":{"name":"International Journal of Approximate Reasoning","volume":"169 ","pages":"Article 109169"},"PeriodicalIF":3.2000,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Approximate Reasoning","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0888613X24000562","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Among the multi-source data clustering tasks, there is a kind of frequently encountered tasks where only one of the multi-source datasets is available for sake of privacy and other reasons. The only available dataset is called local dataset, and the other are called external datasets. The horizontal collaborative fuzzy clustering (HCFC) model is a typical one that can deal with such clustering tasks. In HCFC, each external dataset is used through the knowledge mined from it rather than itself. The knowledge expressed as a knowledge partition matrix is fused into the clustering process of the local dataset. Reviewing the existing HCFC models, we can find three issues that need improvement. Firstly, the existing HCFC models quantify the collaboration contribution of each external knowledge by a hyperparameter at dataset-level, and moreover, do not distinguish the collaboration contributions of objects in the same external dataset. This may lead to counterintuitive clustering results. Focused on this issue, this paper proposes an enhanced HCFC (EHCFC) algorithm that extends the collaboration from dataset-level to object-level, and assigns different weights to objects based on the information amount provided by objects. Through EHCFC, a more flexible collaboration and a more intuitive clustering result can be reached. Secondly, the collaboration mechanisms of the existing HCFC models require that the dimensionalities of the partition matrices of external datasets and local dataset are the same, which makes the HCFC algorithms unable to work in many real situations. Focused on this limitation, a knowledge adaption mechanism based on relative entropy and spectral clustering is proposed resulting in a further refined EHCFC-KA algorithm, i.e., EHCFC with knowledge adaption. The proposed knowledge adaption mechanism makes both the HCFC algorithms and the EHCFC algorithm effective and successful in more application scenarios. Finally, we define two indexes in terms of consistency (the consistency of the clustering result with external knowledge) to evaluate the performance of collaborative clustering. Experiments on synthetic datasets and UCI public datasets demonstrate that the proposed EHCFC and EHCFC-KA algorithms outperform the existing HCFC algorithms and achieve significantly better intuitive collaborative clustering performance.
期刊介绍:
The International Journal of Approximate Reasoning is intended to serve as a forum for the treatment of imprecision and uncertainty in Artificial and Computational Intelligence, covering both the foundations of uncertainty theories, and the design of intelligent systems for scientific and engineering applications. It publishes high-quality research papers describing theoretical developments or innovative applications, as well as review articles on topics of general interest.
Relevant topics include, but are not limited to, probabilistic reasoning and Bayesian networks, imprecise probabilities, random sets, belief functions (Dempster-Shafer theory), possibility theory, fuzzy sets, rough sets, decision theory, non-additive measures and integrals, qualitative reasoning about uncertainty, comparative probability orderings, game-theoretic probability, default reasoning, nonstandard logics, argumentation systems, inconsistency tolerant reasoning, elicitation techniques, philosophical foundations and psychological models of uncertain reasoning.
Domains of application for uncertain reasoning systems include risk analysis and assessment, information retrieval and database design, information fusion, machine learning, data and web mining, computer vision, image and signal processing, intelligent data analysis, statistics, multi-agent systems, etc.