基于局部密度趋势图的三向聚类

IF 3.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Approximate Reasoning Pub Date : 2025-03-20 DOI:10.1016/j.ijar.2025.109422

Haifeng Yang , Weiqi Wang , Jianghui Cai , Jie Wang , Yating Li , Yaling Xun , Xujun Zhao

{"title":"基于局部密度趋势图的三向聚类","authors":"Haifeng Yang , Weiqi Wang , Jianghui Cai , Jie Wang , Yating Li , Yaling Xun , Xujun Zhao","doi":"10.1016/j.ijar.2025.109422","DOIUrl":null,"url":null,"abstract":"<div><div>Three-way clustering demonstrates its unique advantages in dealing with the issues of information ambiguity and unclear boundaries present in real-world datasets. The core and boundary region in the data are identified as key features of cluster analysis. Typically, data is segmented into three regions based on a set of predetermined global thresholds, a common practice in three-way clustering. However, this method, which relies on global thresholds, often overlooks the intrinsic distribution patterns within the dataset and determining these thresholds a priori can be quite challenging. In this paper, we propose a three-way clustering method based on the graph of local density trend (3W-GLDT). Specifically, the algorithm first uses a density-decreasing strategy to build subgraphs and divide the core region data. Then, the unreasonable connection is corrected by using isolated forest, which increases the number of core points and enlarges the distribution range of core points. Next, a three-way allocation strategy is proposed, which fully considers the degree of local aggregation of subgraphs and the natural domain information of each data object to ensure the correct allocation. Finally, the proposed algorithm is compared with 8 different clustering methods on 8 synthetic datasets and 10 UCI real datasets. The experimental results show that the 3W-GLDT algorithm has good performance and clustering results.</div></div>","PeriodicalId":13842,"journal":{"name":"International Journal of Approximate Reasoning","volume":"182 ","pages":"Article 109422"},"PeriodicalIF":3.2000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Three-way clustering based on the graph of local density trend\",\"authors\":\"Haifeng Yang , Weiqi Wang , Jianghui Cai , Jie Wang , Yating Li , Yaling Xun , Xujun Zhao\",\"doi\":\"10.1016/j.ijar.2025.109422\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Three-way clustering demonstrates its unique advantages in dealing with the issues of information ambiguity and unclear boundaries present in real-world datasets. The core and boundary region in the data are identified as key features of cluster analysis. Typically, data is segmented into three regions based on a set of predetermined global thresholds, a common practice in three-way clustering. However, this method, which relies on global thresholds, often overlooks the intrinsic distribution patterns within the dataset and determining these thresholds a priori can be quite challenging. In this paper, we propose a three-way clustering method based on the graph of local density trend (3W-GLDT). Specifically, the algorithm first uses a density-decreasing strategy to build subgraphs and divide the core region data. Then, the unreasonable connection is corrected by using isolated forest, which increases the number of core points and enlarges the distribution range of core points. Next, a three-way allocation strategy is proposed, which fully considers the degree of local aggregation of subgraphs and the natural domain information of each data object to ensure the correct allocation. Finally, the proposed algorithm is compared with 8 different clustering methods on 8 synthetic datasets and 10 UCI real datasets. The experimental results show that the 3W-GLDT algorithm has good performance and clustering results.</div></div>\",\"PeriodicalId\":13842,\"journal\":{\"name\":\"International Journal of Approximate Reasoning\",\"volume\":\"182 \",\"pages\":\"Article 109422\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Approximate Reasoning\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0888613X25000635\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Approximate Reasoning","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0888613X25000635","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

三向聚类在处理现实世界数据集中存在的信息模糊和边界不清等问题方面显示出其独特的优势。数据中的核心区域和边界区域被确定为聚类分析的关键特征。通常，数据会根据一组预先确定的全局阈值划分为三个区域，这也是三向聚类的常见做法。然而，这种依赖于全局阈值的方法往往会忽略数据集内的内在分布模式，而且先验地确定这些阈值可能相当具有挑战性。本文提出了一种基于局部密度趋势图的三向聚类方法（3W-GLDT）。具体来说，该算法首先使用密度递减策略建立子图，划分核心区域数据。然后，利用孤立森林修正不合理连接，增加核心点数量，扩大核心点分布范围。接着，提出三向分配策略，充分考虑子图的局部聚合程度和各数据对象的自然域信息，确保分配的正确性。最后，在 8 个合成数据集和 10 个 UCI 真实数据集上，将提出的算法与 8 种不同的聚类方法进行了比较。实验结果表明，3W-GLDT 算法具有良好的性能和聚类效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Three-way clustering based on the graph of local density trend

Three-way clustering demonstrates its unique advantages in dealing with the issues of information ambiguity and unclear boundaries present in real-world datasets. The core and boundary region in the data are identified as key features of cluster analysis. Typically, data is segmented into three regions based on a set of predetermined global thresholds, a common practice in three-way clustering. However, this method, which relies on global thresholds, often overlooks the intrinsic distribution patterns within the dataset and determining these thresholds a priori can be quite challenging. In this paper, we propose a three-way clustering method based on the graph of local density trend (3W-GLDT). Specifically, the algorithm first uses a density-decreasing strategy to build subgraphs and divide the core region data. Then, the unreasonable connection is corrected by using isolated forest, which increases the number of core points and enlarges the distribution range of core points. Next, a three-way allocation strategy is proposed, which fully considers the degree of local aggregation of subgraphs and the natural domain information of each data object to ensure the correct allocation. Finally, the proposed algorithm is compared with 8 different clustering methods on 8 synthetic datasets and 10 UCI real datasets. The experimental results show that the 3W-GLDT algorithm has good performance and clustering results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Approximate Reasoning 工程技术-计算机：人工智能

CiteScore

6.90

自引率

12.80%

发文量

170

审稿时长

67 days

期刊介绍： The International Journal of Approximate Reasoning is intended to serve as a forum for the treatment of imprecision and uncertainty in Artificial and Computational Intelligence, covering both the foundations of uncertainty theories, and the design of intelligent systems for scientific and engineering applications. It publishes high-quality research papers describing theoretical developments or innovative applications, as well as review articles on topics of general interest. Relevant topics include, but are not limited to, probabilistic reasoning and Bayesian networks, imprecise probabilities, random sets, belief functions (Dempster-Shafer theory), possibility theory, fuzzy sets, rough sets, decision theory, non-additive measures and integrals, qualitative reasoning about uncertainty, comparative probability orderings, game-theoretic probability, default reasoning, nonstandard logics, argumentation systems, inconsistency tolerant reasoning, elicitation techniques, philosophical foundations and psychological models of uncertain reasoning. Domains of application for uncertain reasoning systems include risk analysis and assessment, information retrieval and database design, information fusion, machine learning, data and web mining, computer vision, image and signal processing, intelligent data analysis, statistics, multi-agent systems, etc.