An auto-weighted enhanced horizontal collaborative fuzzy clustering algorithm with knowledge adaption mechanism

IF 3.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Approximate Reasoning Pub Date : 2024-03-12 DOI:10.1016/j.ijar.2024.109169

Huilin Yang , Fusheng Yu , Witold Pedrycz , Life Fellow, IEEE, Zonglin Yang , Jiaqi Chang , Jiayin Wang

{"title":"An auto-weighted enhanced horizontal collaborative fuzzy clustering algorithm with knowledge adaption mechanism","authors":"Huilin Yang , Fusheng Yu , Witold Pedrycz , Life Fellow, IEEE, Zonglin Yang , Jiaqi Chang , Jiayin Wang","doi":"10.1016/j.ijar.2024.109169","DOIUrl":null,"url":null,"abstract":"<div><p>Among the multi-source data clustering tasks, there is a kind of frequently encountered tasks where only one of the multi-source datasets is available for sake of privacy and other reasons. The only available dataset is called local dataset, and the other are called external datasets. The horizontal collaborative fuzzy clustering (HCFC) model is a typical one that can deal with such clustering tasks. In HCFC, each external dataset is used through the knowledge mined from it rather than itself. The knowledge expressed as a knowledge partition matrix is fused into the clustering process of the local dataset. Reviewing the existing HCFC models, we can find three issues that need improvement. Firstly, the existing HCFC models quantify the collaboration contribution of each external knowledge by a hyperparameter at dataset-level, and moreover, do not distinguish the collaboration contributions of objects in the same external dataset. This may lead to counterintuitive clustering results. Focused on this issue, this paper proposes an enhanced HCFC (EHCFC) algorithm that extends the collaboration from dataset-level to object-level, and assigns different weights to objects based on the information amount provided by objects. Through EHCFC, a more flexible collaboration and a more intuitive clustering result can be reached. Secondly, the collaboration mechanisms of the existing HCFC models require that the dimensionalities of the partition matrices of external datasets and local dataset are the same, which makes the HCFC algorithms unable to work in many real situations. Focused on this limitation, a knowledge adaption mechanism based on relative entropy and spectral clustering is proposed resulting in a further refined EHCFC-KA algorithm, i.e., EHCFC with knowledge adaption. The proposed knowledge adaption mechanism makes both the HCFC algorithms and the EHCFC algorithm effective and successful in more application scenarios. Finally, we define two indexes in terms of consistency (the consistency of the clustering result with external knowledge) to evaluate the performance of collaborative clustering. Experiments on synthetic datasets and UCI public datasets demonstrate that the proposed EHCFC and EHCFC-KA algorithms outperform the existing HCFC algorithms and achieve significantly better intuitive collaborative clustering performance.</p></div>","PeriodicalId":13842,"journal":{"name":"International Journal of Approximate Reasoning","volume":"169 ","pages":"Article 109169"},"PeriodicalIF":3.2000,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Approximate Reasoning","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0888613X24000562","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Among the multi-source data clustering tasks, there is a kind of frequently encountered tasks where only one of the multi-source datasets is available for sake of privacy and other reasons. The only available dataset is called local dataset, and the other are called external datasets. The horizontal collaborative fuzzy clustering (HCFC) model is a typical one that can deal with such clustering tasks. In HCFC, each external dataset is used through the knowledge mined from it rather than itself. The knowledge expressed as a knowledge partition matrix is fused into the clustering process of the local dataset. Reviewing the existing HCFC models, we can find three issues that need improvement. Firstly, the existing HCFC models quantify the collaboration contribution of each external knowledge by a hyperparameter at dataset-level, and moreover, do not distinguish the collaboration contributions of objects in the same external dataset. This may lead to counterintuitive clustering results. Focused on this issue, this paper proposes an enhanced HCFC (EHCFC) algorithm that extends the collaboration from dataset-level to object-level, and assigns different weights to objects based on the information amount provided by objects. Through EHCFC, a more flexible collaboration and a more intuitive clustering result can be reached. Secondly, the collaboration mechanisms of the existing HCFC models require that the dimensionalities of the partition matrices of external datasets and local dataset are the same, which makes the HCFC algorithms unable to work in many real situations. Focused on this limitation, a knowledge adaption mechanism based on relative entropy and spectral clustering is proposed resulting in a further refined EHCFC-KA algorithm, i.e., EHCFC with knowledge adaption. The proposed knowledge adaption mechanism makes both the HCFC algorithms and the EHCFC algorithm effective and successful in more application scenarios. Finally, we define two indexes in terms of consistency (the consistency of the clustering result with external knowledge) to evaluate the performance of collaborative clustering. Experiments on synthetic datasets and UCI public datasets demonstrate that the proposed EHCFC and EHCFC-KA algorithms outperform the existing HCFC algorithms and achieve significantly better intuitive collaborative clustering performance.

查看原文本刊更多论文

具有知识适应机制的自动加权增强型水平协作模糊聚类算法

在多源数据聚类任务中，有一种任务经常遇到，出于隐私和其他原因，多源数据集中只有一个数据集可用。唯一可用的数据集称为本地数据集，其他数据集称为外部数据集。水平协作模糊聚类（HCFC）模型是可以处理此类聚类任务的典型模型。在 HCFC 中，每个外部数据集都是通过从中挖掘的知识而不是其本身来使用的。以知识分区矩阵表示的知识被融合到本地数据集的聚类过程中。回顾现有的氟氯烃模型，我们可以发现三个需要改进的问题。首先，现有的HCFC模型通过数据集级别的超参数量化每个外部知识的协作贡献，而且没有区分同一外部数据集中对象的协作贡献。这可能会导致反直觉的聚类结果。针对这一问题，本文提出了一种增强型 HCFC（EHCFC）算法，将协作从数据集层面扩展到对象层面，并根据对象提供的信息量为对象分配不同的权重。通过 EHCFC，可以实现更灵活的协作和更直观的聚类结果。其次，现有HCFC模型的协作机制要求外部数据集和本地数据集的分区矩阵维数相同，这使得HCFC算法在很多实际情况下无法发挥作用。针对这一局限，我们提出了一种基于相对熵和光谱聚类的知识自适应机制，从而进一步完善了 EHCFC-KA 算法，即具有知识自适应功能的 EHCFC。所提出的知识自适应机制使 HCFC 算法和 EHCFC 算法在更多的应用场景中都取得了成功。最后，我们定义了两个一致性指标（聚类结果与外部知识的一致性）来评估协作聚类的性能。在合成数据集和 UCI 公共数据集上的实验表明，所提出的 EHCFC 算法和 EHCFC-KA 算法优于现有的 HCFC 算法，并取得了明显更好的直观协作聚类性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Approximate Reasoning 工程技术-计算机：人工智能

CiteScore

6.90

自引率

12.80%

发文量

170

审稿时长

67 days

期刊介绍： The International Journal of Approximate Reasoning is intended to serve as a forum for the treatment of imprecision and uncertainty in Artificial and Computational Intelligence, covering both the foundations of uncertainty theories, and the design of intelligent systems for scientific and engineering applications. It publishes high-quality research papers describing theoretical developments or innovative applications, as well as review articles on topics of general interest. Relevant topics include, but are not limited to, probabilistic reasoning and Bayesian networks, imprecise probabilities, random sets, belief functions (Dempster-Shafer theory), possibility theory, fuzzy sets, rough sets, decision theory, non-additive measures and integrals, qualitative reasoning about uncertainty, comparative probability orderings, game-theoretic probability, default reasoning, nonstandard logics, argumentation systems, inconsistency tolerant reasoning, elicitation techniques, philosophical foundations and psychological models of uncertain reasoning. Domains of application for uncertain reasoning systems include risk analysis and assessment, information retrieval and database design, information fusion, machine learning, data and web mining, computer vision, image and signal processing, intelligent data analysis, statistics, multi-agent systems, etc.