{"title":"三向空间结构与分类数据聚类","authors":"Ruxiao Zhang , Hongying Zhang , Yuhua Qian","doi":"10.1016/j.ijar.2025.109457","DOIUrl":null,"url":null,"abstract":"<div><div>Measures of similarity and dissimilarity play a pivotal role in discovering the structural characteristics of categorical data, which is crucial in various fields and applications because many real-world problems involve qualitative information. However, co-occurrence probability, a popular similarity measure widely used to construct a linear representation space for analyzing categorical data, has been verified to be inefficient in describing the global similarity of objects. In this paper, drawing inspiration from three-way concept theory, a novel three-way representation scheme is developed through the tripartite definition of <em>Common Feature Values (CV), Non-common Feature Values (NV)</em>, and <em>Respective Feature Values (RV)</em>, which enables comprehensive characterization of intrinsic feature relationships while maintaining interpretative granularity. These three concepts correspond to the positive, negative, and boundary domains of the feature universe, respectively, which map a set of categorical objects into three Euclidean spaces. Utilizing the three-way categorical data representation scheme, we develop a unified framework for three-way space structure based categorical clustering algorithms (TWSBC). Finally, to verify the performance of the TWSBC algorithm, which combines the representation scheme with the k-means algorithm, we employ the SBC- type algorithm as a reference. Extensive experiments show that our proposed TWSBC-type algorithms distinctly outperform SBC-type methods in terms of space correlation and clustering performance.</div></div>","PeriodicalId":13842,"journal":{"name":"International Journal of Approximate Reasoning","volume":"184 ","pages":"Article 109457"},"PeriodicalIF":3.2000,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Three-way space structure and clustering of categorical data\",\"authors\":\"Ruxiao Zhang , Hongying Zhang , Yuhua Qian\",\"doi\":\"10.1016/j.ijar.2025.109457\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Measures of similarity and dissimilarity play a pivotal role in discovering the structural characteristics of categorical data, which is crucial in various fields and applications because many real-world problems involve qualitative information. However, co-occurrence probability, a popular similarity measure widely used to construct a linear representation space for analyzing categorical data, has been verified to be inefficient in describing the global similarity of objects. In this paper, drawing inspiration from three-way concept theory, a novel three-way representation scheme is developed through the tripartite definition of <em>Common Feature Values (CV), Non-common Feature Values (NV)</em>, and <em>Respective Feature Values (RV)</em>, which enables comprehensive characterization of intrinsic feature relationships while maintaining interpretative granularity. These three concepts correspond to the positive, negative, and boundary domains of the feature universe, respectively, which map a set of categorical objects into three Euclidean spaces. Utilizing the three-way categorical data representation scheme, we develop a unified framework for three-way space structure based categorical clustering algorithms (TWSBC). Finally, to verify the performance of the TWSBC algorithm, which combines the representation scheme with the k-means algorithm, we employ the SBC- type algorithm as a reference. Extensive experiments show that our proposed TWSBC-type algorithms distinctly outperform SBC-type methods in terms of space correlation and clustering performance.</div></div>\",\"PeriodicalId\":13842,\"journal\":{\"name\":\"International Journal of Approximate Reasoning\",\"volume\":\"184 \",\"pages\":\"Article 109457\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-05-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Approximate Reasoning\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0888613X25000982\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Approximate Reasoning","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0888613X25000982","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Three-way space structure and clustering of categorical data
Measures of similarity and dissimilarity play a pivotal role in discovering the structural characteristics of categorical data, which is crucial in various fields and applications because many real-world problems involve qualitative information. However, co-occurrence probability, a popular similarity measure widely used to construct a linear representation space for analyzing categorical data, has been verified to be inefficient in describing the global similarity of objects. In this paper, drawing inspiration from three-way concept theory, a novel three-way representation scheme is developed through the tripartite definition of Common Feature Values (CV), Non-common Feature Values (NV), and Respective Feature Values (RV), which enables comprehensive characterization of intrinsic feature relationships while maintaining interpretative granularity. These three concepts correspond to the positive, negative, and boundary domains of the feature universe, respectively, which map a set of categorical objects into three Euclidean spaces. Utilizing the three-way categorical data representation scheme, we develop a unified framework for three-way space structure based categorical clustering algorithms (TWSBC). Finally, to verify the performance of the TWSBC algorithm, which combines the representation scheme with the k-means algorithm, we employ the SBC- type algorithm as a reference. Extensive experiments show that our proposed TWSBC-type algorithms distinctly outperform SBC-type methods in terms of space correlation and clustering performance.
期刊介绍:
The International Journal of Approximate Reasoning is intended to serve as a forum for the treatment of imprecision and uncertainty in Artificial and Computational Intelligence, covering both the foundations of uncertainty theories, and the design of intelligent systems for scientific and engineering applications. It publishes high-quality research papers describing theoretical developments or innovative applications, as well as review articles on topics of general interest.
Relevant topics include, but are not limited to, probabilistic reasoning and Bayesian networks, imprecise probabilities, random sets, belief functions (Dempster-Shafer theory), possibility theory, fuzzy sets, rough sets, decision theory, non-additive measures and integrals, qualitative reasoning about uncertainty, comparative probability orderings, game-theoretic probability, default reasoning, nonstandard logics, argumentation systems, inconsistency tolerant reasoning, elicitation techniques, philosophical foundations and psychological models of uncertain reasoning.
Domains of application for uncertain reasoning systems include risk analysis and assessment, information retrieval and database design, information fusion, machine learning, data and web mining, computer vision, image and signal processing, intelligent data analysis, statistics, multi-agent systems, etc.