调整模糊指数扩展的随机模型

IF 1.3 4区计算机科学 Q2 STATISTICS & PROBABILITY

Advances in Data Analysis and Classification Pub Date : 2025-02-13 DOI:10.1007/s11634-025-00625-w

Ryan DeWolfe, Jeffrey L. Andrews

{"title":"调整模糊指数扩展的随机模型","authors":"Ryan DeWolfe, Jeffrey L. Andrews","doi":"10.1007/s11634-025-00625-w","DOIUrl":null,"url":null,"abstract":"<div><p>The adjusted Rand index (ARI) is a widely used method for comparing hard clusterings, but requires a choice of random model that is often left implicit. Several recent works have extended the Rand index to fuzzy clusterings and adjusted for chance agreement with the permutation model, but the assumptions of this random model are difficult to justify for fuzzy clusterings. Previous work on random models for hard clusterings has shown that different random models can impact similarity rankings, so matching the assumptions of the random model to the algorithm is essential. We propose a single framework computing the ARI with three new random models that are intuitive and explainable for both hard and fuzzy clusterings. The theory and assumptions of the proposed models are contrasted with the existing permutation model, and computations on synthetic and benchmark data show that each model has distinct behaviour, meaning accurate model selection is important for the reliability of results.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 classification and related methods”","pages":"361 - 385"},"PeriodicalIF":1.3000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Random models for adjusting fuzzy rand index extensions\",\"authors\":\"Ryan DeWolfe, Jeffrey L. Andrews\",\"doi\":\"10.1007/s11634-025-00625-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The adjusted Rand index (ARI) is a widely used method for comparing hard clusterings, but requires a choice of random model that is often left implicit. Several recent works have extended the Rand index to fuzzy clusterings and adjusted for chance agreement with the permutation model, but the assumptions of this random model are difficult to justify for fuzzy clusterings. Previous work on random models for hard clusterings has shown that different random models can impact similarity rankings, so matching the assumptions of the random model to the algorithm is essential. We propose a single framework computing the ARI with three new random models that are intuitive and explainable for both hard and fuzzy clusterings. The theory and assumptions of the proposed models are contrasted with the existing permutation model, and computations on synthetic and benchmark data show that each model has distinct behaviour, meaning accurate model selection is important for the reliability of results.</p></div>\",\"PeriodicalId\":49270,\"journal\":{\"name\":\"Advances in Data Analysis and Classification\",\"volume\":\"19 classification and related methods”\",\"pages\":\"361 - 385\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2025-02-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advances in Data Analysis and Classification\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s11634-025-00625-w\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Data Analysis and Classification","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s11634-025-00625-w","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

摘要

调整后的兰德指数（ARI）是一种广泛使用的比较硬聚类的方法，但需要选择随机模型，这通常是隐式的。最近的一些工作已经将兰德指数扩展到模糊聚类，并调整了与排列模型的偶然一致性，但这种随机模型的假设很难证明模糊聚类的合理性。先前关于硬聚类随机模型的研究表明，不同的随机模型会影响相似度排名，因此将随机模型的假设与算法相匹配是至关重要的。我们提出了一个计算ARI的单一框架，其中包含三个新的随机模型，这些模型对硬聚类和模糊聚类都是直观和可解释的。将所提模型的理论和假设与现有的置换模型进行了对比，在综合数据和基准数据上的计算表明，每个模型都有不同的行为，这意味着准确的模型选择对结果的可靠性至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Random models for adjusting fuzzy rand index extensions

查看原文本刊更多论文

Random models for adjusting fuzzy rand index extensions

The adjusted Rand index (ARI) is a widely used method for comparing hard clusterings, but requires a choice of random model that is often left implicit. Several recent works have extended the Rand index to fuzzy clusterings and adjusted for chance agreement with the permutation model, but the assumptions of this random model are difficult to justify for fuzzy clusterings. Previous work on random models for hard clusterings has shown that different random models can impact similarity rankings, so matching the assumptions of the random model to the algorithm is essential. We propose a single framework computing the ARI with three new random models that are intuitive and explainable for both hard and fuzzy clusterings. The theory and assumptions of the proposed models are contrasted with the existing permutation model, and computations on synthetic and benchmark data show that each model has distinct behaviour, meaning accurate model selection is important for the reliability of results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Advances in Data Analysis and Classification STATISTICS & PROBABILITY-

CiteScore

3.40

自引率

6.20%

发文量

审稿时长

>12 weeks

期刊介绍： The international journal Advances in Data Analysis and Classification (ADAC) is designed as a forum for high standard publications on research and applications concerning the extraction of knowable aspects from many types of data. It publishes articles on such topics as structural, quantitative, or statistical approaches for the analysis of data; advances in classification, clustering, and pattern recognition methods; strategies for modeling complex data and mining large data sets; methods for the extraction of knowledge from data, and applications of advanced methods in specific domains of practice. Articles illustrate how new domain-specific knowledge can be made available from data by skillful use of data analysis methods. The journal also publishes survey papers that outline, and illuminate the basic ideas and techniques of special approaches.