Ryan Hartman, Josemar Faustino, Diego Pinheiro, R. Menezes
{"title":"Assessing the suitability of network community detection to available meta-data using rank stability","authors":"Ryan Hartman, Josemar Faustino, Diego Pinheiro, R. Menezes","doi":"10.1145/3106426.3106493","DOIUrl":null,"url":null,"abstract":"In the last two decades, we have witnessed the widespread use of structural analysis of data. The area, generally called Network Science, concentrates on understanding complex phenomena by looking for properties that emerge from the relationships between the pieces of data instead of the traditional mining of the data itself. A commonly used structural analysis in networks consists of finding subgraphs whose density of connections within the subgraph surpasses that of outside connections; called Community Detection. Many techniques have been proposed to find communities as well as benchmarks to evaluate the algorithms ability to find these substructures. Until recently, the literature has mostly neglected the fact that these communities often represent common characteristic of the elements in the community. For instance, in a social network, communities could represent: people who follow the same particular sport, people from the same classroom, authors working in the same field of study, to name a few. The problem here is one of community detection selection as a function of the ground truth provided by available meta-data. In this work, we propose the use of rank stability (entropy of ranks) to assess communities identified using different techniques from the perspective of meta-data. We validate our approach using a large-scale data set of on-line social interactions across multiple community detection techniques.","PeriodicalId":20685,"journal":{"name":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3106426.3106493","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
In the last two decades, we have witnessed the widespread use of structural analysis of data. The area, generally called Network Science, concentrates on understanding complex phenomena by looking for properties that emerge from the relationships between the pieces of data instead of the traditional mining of the data itself. A commonly used structural analysis in networks consists of finding subgraphs whose density of connections within the subgraph surpasses that of outside connections; called Community Detection. Many techniques have been proposed to find communities as well as benchmarks to evaluate the algorithms ability to find these substructures. Until recently, the literature has mostly neglected the fact that these communities often represent common characteristic of the elements in the community. For instance, in a social network, communities could represent: people who follow the same particular sport, people from the same classroom, authors working in the same field of study, to name a few. The problem here is one of community detection selection as a function of the ground truth provided by available meta-data. In this work, we propose the use of rank stability (entropy of ranks) to assess communities identified using different techniques from the perspective of meta-data. We validate our approach using a large-scale data set of on-line social interactions across multiple community detection techniques.