{"title":"Instability of clustering metrics in overlapping community detection algorithms","authors":"Diego Kiedanski, P. Rodríguez-Bocca","doi":"10.1109/CLEI53233.2021.9640094","DOIUrl":null,"url":null,"abstract":"In this paper, we study the impact of data complexity and data quality in the overlapping community detection problem. We show that community detection algorithms are very unstable against incomplete or erroneous data, and this result is consistent with all the evaluated performance metrics. We verify it using three quality metrics (F1, NMI, and Omega) when the ground-truth community structure is known, in four very popular and representative detection algorithms: Order Statistics Local Optimization Method (OSLOM), Greedy Clique Expansion (GCE) algorithm, Speaker-listener Label Propagation Algorithm (SLPA), and Cluster Affiliation Model for Big Networks (BIG-CLAM). We evaluate it over a set of real instances that arise from detecting the courses that belong to different careers (degrees) of an engineering University, and over large benchmark sets of synthetic instances frequently used in the literature.","PeriodicalId":6803,"journal":{"name":"2021 XLVII Latin American Computing Conference (CLEI)","volume":"31 1","pages":"1-11"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 XLVII Latin American Computing Conference (CLEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLEI53233.2021.9640094","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this paper, we study the impact of data complexity and data quality in the overlapping community detection problem. We show that community detection algorithms are very unstable against incomplete or erroneous data, and this result is consistent with all the evaluated performance metrics. We verify it using three quality metrics (F1, NMI, and Omega) when the ground-truth community structure is known, in four very popular and representative detection algorithms: Order Statistics Local Optimization Method (OSLOM), Greedy Clique Expansion (GCE) algorithm, Speaker-listener Label Propagation Algorithm (SLPA), and Cluster Affiliation Model for Big Networks (BIG-CLAM). We evaluate it over a set of real instances that arise from detecting the courses that belong to different careers (degrees) of an engineering University, and over large benchmark sets of synthetic instances frequently used in the literature.