{"title":"Jaccard-constrained dense subgraph discovery","authors":"Chamalee Wickrama Arachchi, Nikolaj Tatti","doi":"10.1007/s10994-024-06595-y","DOIUrl":null,"url":null,"abstract":"<p>Finding dense subgraphs is a core problem in graph mining with many applications in diverse domains. At the same time many real-world networks vary over time, that is, the dataset can be represented as a sequence of graph snapshots. Hence, it is natural to consider the question of finding dense subgraphs in a temporal network that are allowed to vary over time to a certain degree. In this paper, we search for dense subgraphs that have large pairwise Jaccard similarity coefficients. More formally, given a set of graph snapshots and input parameter <span>\\(\\alpha\\)</span>, we find a collection of dense subgraphs, with pairwise Jaccard index at least <span>\\(\\alpha\\)</span>, such that the sum of densities of the induced subgraphs is maximized. We prove that this problem is <b>NP</b>-hard and we present a greedy, iterative algorithm which runs in <span>\\({\\mathcal {O}} \\mathopen {} \\left( nk^2 + m\\right)\\)</span> time per single iteration, where <i>k</i> is the length of the graph sequence and <i>n</i> and <i>m</i> denote number of vertices and total number of edges respectively. We also consider an alternative problem where subgraphs with large pairwise Jaccard indices are rewarded. We do this by incorporating the indices directly into the objective function. More formally, given a set of graph snapshots and a weight <span>\\(\\lambda\\)</span>, we find a collection of dense subgraphs such that the sum of densities of the induced subgraphs plus the sum of Jaccard indices, weighted by <span>\\(\\lambda\\)</span>, is maximized. We prove that this problem is <b>NP</b>-hard. To discover dense subgraphs with good objective value, we present an iterative algorithm which runs in <span>\\({\\mathcal {O}} \\mathopen {}\\left( n^2k^2 + m \\log n + k^3 n\\right)\\)</span> time per single iteration, and a greedy algorithm which runs in <span>\\({\\mathcal {O}} \\mathopen {}\\left( n^2k^2 + m \\log n + k^3 n\\right)\\)</span> time. We show experimentally that our algorithms are efficient, they can find ground truth in synthetic datasets and provide good results from real-world datasets. Finally, we present two case studies that show the usefulness of our problem.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10994-024-06595-y","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Finding dense subgraphs is a core problem in graph mining with many applications in diverse domains. At the same time many real-world networks vary over time, that is, the dataset can be represented as a sequence of graph snapshots. Hence, it is natural to consider the question of finding dense subgraphs in a temporal network that are allowed to vary over time to a certain degree. In this paper, we search for dense subgraphs that have large pairwise Jaccard similarity coefficients. More formally, given a set of graph snapshots and input parameter \(\alpha\), we find a collection of dense subgraphs, with pairwise Jaccard index at least \(\alpha\), such that the sum of densities of the induced subgraphs is maximized. We prove that this problem is NP-hard and we present a greedy, iterative algorithm which runs in \({\mathcal {O}} \mathopen {} \left( nk^2 + m\right)\) time per single iteration, where k is the length of the graph sequence and n and m denote number of vertices and total number of edges respectively. We also consider an alternative problem where subgraphs with large pairwise Jaccard indices are rewarded. We do this by incorporating the indices directly into the objective function. More formally, given a set of graph snapshots and a weight \(\lambda\), we find a collection of dense subgraphs such that the sum of densities of the induced subgraphs plus the sum of Jaccard indices, weighted by \(\lambda\), is maximized. We prove that this problem is NP-hard. To discover dense subgraphs with good objective value, we present an iterative algorithm which runs in \({\mathcal {O}} \mathopen {}\left( n^2k^2 + m \log n + k^3 n\right)\) time per single iteration, and a greedy algorithm which runs in \({\mathcal {O}} \mathopen {}\left( n^2k^2 + m \log n + k^3 n\right)\) time. We show experimentally that our algorithms are efficient, they can find ground truth in synthetic datasets and provide good results from real-world datasets. Finally, we present two case studies that show the usefulness of our problem.
期刊介绍:
Machine Learning serves as a global platform dedicated to computational approaches in learning. The journal reports substantial findings on diverse learning methods applied to various problems, offering support through empirical studies, theoretical analysis, or connections to psychological phenomena. It demonstrates the application of learning methods to solve significant problems and aims to enhance the conduct of machine learning research with a focus on verifiable and replicable evidence in published papers.