{"title":"Random models for adjusting fuzzy rand index extensions","authors":"Ryan DeWolfe, Jeffrey L. Andrews","doi":"10.1007/s11634-025-00625-w","DOIUrl":null,"url":null,"abstract":"<div><p>The adjusted Rand index (ARI) is a widely used method for comparing hard clusterings, but requires a choice of random model that is often left implicit. Several recent works have extended the Rand index to fuzzy clusterings and adjusted for chance agreement with the permutation model, but the assumptions of this random model are difficult to justify for fuzzy clusterings. Previous work on random models for hard clusterings has shown that different random models can impact similarity rankings, so matching the assumptions of the random model to the algorithm is essential. We propose a single framework computing the ARI with three new random models that are intuitive and explainable for both hard and fuzzy clusterings. The theory and assumptions of the proposed models are contrasted with the existing permutation model, and computations on synthetic and benchmark data show that each model has distinct behaviour, meaning accurate model selection is important for the reliability of results.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 classification and related methods”","pages":"361 - 385"},"PeriodicalIF":1.3000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Data Analysis and Classification","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s11634-025-00625-w","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
Abstract
The adjusted Rand index (ARI) is a widely used method for comparing hard clusterings, but requires a choice of random model that is often left implicit. Several recent works have extended the Rand index to fuzzy clusterings and adjusted for chance agreement with the permutation model, but the assumptions of this random model are difficult to justify for fuzzy clusterings. Previous work on random models for hard clusterings has shown that different random models can impact similarity rankings, so matching the assumptions of the random model to the algorithm is essential. We propose a single framework computing the ARI with three new random models that are intuitive and explainable for both hard and fuzzy clusterings. The theory and assumptions of the proposed models are contrasted with the existing permutation model, and computations on synthetic and benchmark data show that each model has distinct behaviour, meaning accurate model selection is important for the reliability of results.
期刊介绍:
The international journal Advances in Data Analysis and Classification (ADAC) is designed as a forum for high standard publications on research and applications concerning the extraction of knowable aspects from many types of data. It publishes articles on such topics as structural, quantitative, or statistical approaches for the analysis of data; advances in classification, clustering, and pattern recognition methods; strategies for modeling complex data and mining large data sets; methods for the extraction of knowledge from data, and applications of advanced methods in specific domains of practice. Articles illustrate how new domain-specific knowledge can be made available from data by skillful use of data analysis methods. The journal also publishes survey papers that outline, and illuminate the basic ideas and techniques of special approaches.