{"title":"An evolutionary schema for mining skyline clusters of attributed graph data","authors":"Wajdi Dhifli, Noemie Oliveira Da Costa, M. Elati","doi":"10.1109/CEC.2017.7969559","DOIUrl":null,"url":null,"abstract":"Graph clustering is one of the most important research topics in graph mining and network analysis. With the abundance of data in many real-world applications, the graph nodes and edges could be annotated with multiple sets of attributes that could be derived from heterogeneous data sources. Considering these attributes during the graph clustering could help in generating graph clusters with balanced and cohesive intra-cluster structure and nodes having homogeneous properties. In this paper, we propose a genetic algorithm-based graph clustering approach for mining skyline clusters over large attributed graphs based on the dominance relationship. Each skyline solution is optimized with respect to multiple fitness functions simultaneously where each function is defined over the graph topology or over a particular set of attributes that are derived from multiple data sources. We experimentally evaluate our approach on a real-world large protein-protein interaction network of the human interactome enriched with large sets of heterogeneous cancer associated attributes. The obtained results show the efficiency of our approach and how integrating node attributes of multiple data sources allows to obtain a more robust graph clustering than by considering only the graph topology.","PeriodicalId":335123,"journal":{"name":"2017 IEEE Congress on Evolutionary Computation (CEC)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Congress on Evolutionary Computation (CEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEC.2017.7969559","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Graph clustering is one of the most important research topics in graph mining and network analysis. With the abundance of data in many real-world applications, the graph nodes and edges could be annotated with multiple sets of attributes that could be derived from heterogeneous data sources. Considering these attributes during the graph clustering could help in generating graph clusters with balanced and cohesive intra-cluster structure and nodes having homogeneous properties. In this paper, we propose a genetic algorithm-based graph clustering approach for mining skyline clusters over large attributed graphs based on the dominance relationship. Each skyline solution is optimized with respect to multiple fitness functions simultaneously where each function is defined over the graph topology or over a particular set of attributes that are derived from multiple data sources. We experimentally evaluate our approach on a real-world large protein-protein interaction network of the human interactome enriched with large sets of heterogeneous cancer associated attributes. The obtained results show the efficiency of our approach and how integrating node attributes of multiple data sources allows to obtain a more robust graph clustering than by considering only the graph topology.