三向空间结构与分类数据聚类

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Approximate Reasoning Pub Date : 2025-05-05 DOI:10.1016/j.ijar.2025.109457

Ruxiao Zhang , Hongying Zhang , Yuhua Qian

{"title":"三向空间结构与分类数据聚类","authors":"Ruxiao Zhang , Hongying Zhang , Yuhua Qian","doi":"10.1016/j.ijar.2025.109457","DOIUrl":null,"url":null,"abstract":"<div><div>Measures of similarity and dissimilarity play a pivotal role in discovering the structural characteristics of categorical data, which is crucial in various fields and applications because many real-world problems involve qualitative information. However, co-occurrence probability, a popular similarity measure widely used to construct a linear representation space for analyzing categorical data, has been verified to be inefficient in describing the global similarity of objects. In this paper, drawing inspiration from three-way concept theory, a novel three-way representation scheme is developed through the tripartite definition of <em>Common Feature Values (CV), Non-common Feature Values (NV)</em>, and <em>Respective Feature Values (RV)</em>, which enables comprehensive characterization of intrinsic feature relationships while maintaining interpretative granularity. These three concepts correspond to the positive, negative, and boundary domains of the feature universe, respectively, which map a set of categorical objects into three Euclidean spaces. Utilizing the three-way categorical data representation scheme, we develop a unified framework for three-way space structure based categorical clustering algorithms (TWSBC). Finally, to verify the performance of the TWSBC algorithm, which combines the representation scheme with the k-means algorithm, we employ the SBC- type algorithm as a reference. Extensive experiments show that our proposed TWSBC-type algorithms distinctly outperform SBC-type methods in terms of space correlation and clustering performance.</div></div>","PeriodicalId":13842,"journal":{"name":"International Journal of Approximate Reasoning","volume":"184 ","pages":"Article 109457"},"PeriodicalIF":3.0000,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Three-way space structure and clustering of categorical data\",\"authors\":\"Ruxiao Zhang , Hongying Zhang , Yuhua Qian\",\"doi\":\"10.1016/j.ijar.2025.109457\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Measures of similarity and dissimilarity play a pivotal role in discovering the structural characteristics of categorical data, which is crucial in various fields and applications because many real-world problems involve qualitative information. However, co-occurrence probability, a popular similarity measure widely used to construct a linear representation space for analyzing categorical data, has been verified to be inefficient in describing the global similarity of objects. In this paper, drawing inspiration from three-way concept theory, a novel three-way representation scheme is developed through the tripartite definition of <em>Common Feature Values (CV), Non-common Feature Values (NV)</em>, and <em>Respective Feature Values (RV)</em>, which enables comprehensive characterization of intrinsic feature relationships while maintaining interpretative granularity. These three concepts correspond to the positive, negative, and boundary domains of the feature universe, respectively, which map a set of categorical objects into three Euclidean spaces. Utilizing the three-way categorical data representation scheme, we develop a unified framework for three-way space structure based categorical clustering algorithms (TWSBC). Finally, to verify the performance of the TWSBC algorithm, which combines the representation scheme with the k-means algorithm, we employ the SBC- type algorithm as a reference. Extensive experiments show that our proposed TWSBC-type algorithms distinctly outperform SBC-type methods in terms of space correlation and clustering performance.</div></div>\",\"PeriodicalId\":13842,\"journal\":{\"name\":\"International Journal of Approximate Reasoning\",\"volume\":\"184 \",\"pages\":\"Article 109457\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-05-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Approximate Reasoning\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0888613X25000982\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Approximate Reasoning","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0888613X25000982","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

相似性和不相似性的度量在发现分类数据的结构特征方面起着关键作用，这在各个领域和应用中都是至关重要的，因为许多现实世界的问题都涉及定性信息。然而，共现概率是一种流行的相似性度量，广泛用于构建用于分析分类数据的线性表示空间，已被证明在描述对象的全局相似性方面效率低下。本文从三向概念理论中汲取灵感，通过共同特征值（CV）、非共同特征值（NV）和各自特征值（RV）的三方定义，提出了一种新的三向表示方案，在保持解释粒度的同时，能够全面表征内在特征关系。这三个概念分别对应于特征域的正域、负域和边界域，它们将一组范畴对象映射到三个欧几里得空间。利用三向分类数据表示方案，开发了基于三向空间结构的分类聚类算法（TWSBC）的统一框架。最后，为了验证将表示方案与k-means算法相结合的TWSBC算法的性能，我们采用SBC-型算法作为参考。大量实验表明，我们提出的twsbc型算法在空间相关性和聚类性能方面明显优于sbc型方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Three-way space structure and clustering of categorical data

Measures of similarity and dissimilarity play a pivotal role in discovering the structural characteristics of categorical data, which is crucial in various fields and applications because many real-world problems involve qualitative information. However, co-occurrence probability, a popular similarity measure widely used to construct a linear representation space for analyzing categorical data, has been verified to be inefficient in describing the global similarity of objects. In this paper, drawing inspiration from three-way concept theory, a novel three-way representation scheme is developed through the tripartite definition of Common Feature Values (CV), Non-common Feature Values (NV), and Respective Feature Values (RV), which enables comprehensive characterization of intrinsic feature relationships while maintaining interpretative granularity. These three concepts correspond to the positive, negative, and boundary domains of the feature universe, respectively, which map a set of categorical objects into three Euclidean spaces. Utilizing the three-way categorical data representation scheme, we develop a unified framework for three-way space structure based categorical clustering algorithms (TWSBC). Finally, to verify the performance of the TWSBC algorithm, which combines the representation scheme with the k-means algorithm, we employ the SBC- type algorithm as a reference. Extensive experiments show that our proposed TWSBC-type algorithms distinctly outperform SBC-type methods in terms of space correlation and clustering performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Approximate Reasoning 工程技术-计算机：人工智能

CiteScore

6.90

自引率

12.80%

发文量

170

审稿时长

67 days

期刊介绍： The International Journal of Approximate Reasoning is intended to serve as a forum for the treatment of imprecision and uncertainty in Artificial and Computational Intelligence, covering both the foundations of uncertainty theories, and the design of intelligent systems for scientific and engineering applications. It publishes high-quality research papers describing theoretical developments or innovative applications, as well as review articles on topics of general interest. Relevant topics include, but are not limited to, probabilistic reasoning and Bayesian networks, imprecise probabilities, random sets, belief functions (Dempster-Shafer theory), possibility theory, fuzzy sets, rough sets, decision theory, non-additive measures and integrals, qualitative reasoning about uncertainty, comparative probability orderings, game-theoretic probability, default reasoning, nonstandard logics, argumentation systems, inconsistency tolerant reasoning, elicitation techniques, philosophical foundations and psychological models of uncertain reasoning. Domains of application for uncertain reasoning systems include risk analysis and assessment, information retrieval and database design, information fusion, machine learning, data and web mining, computer vision, image and signal processing, intelligent data analysis, statistics, multi-agent systems, etc.