On Breaking Truss-Based and Core-Based Communities

IF 4 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Huiping Chen, Alessio Conte, Roberto Grossi, Grigorios Loukides, Solon P. Pissis, Michelle Sweering
{"title":"On Breaking Truss-Based and Core-Based Communities","authors":"Huiping Chen, Alessio Conte, Roberto Grossi, Grigorios Loukides, Solon P. Pissis, Michelle Sweering","doi":"10.1145/3644077","DOIUrl":null,"url":null,"abstract":"<p>We introduce the general problem of identifying a smallest <i>edge</i> subset of a given graph whose deletion makes the graph community-free. We consider this problem under two community notions which have attracted significant attention: <i>k</i>-truss and <i>k</i>-core. We also introduce a problem variant where the identified subset contains edges incident to a given set of nodes and ensures that these nodes are not contained in any community; <i>k</i>-truss or <i>k</i>-core, in our case. These problems are directly applicable in social networks: the identified edges can be <i>hidden</i> by users or <i>sanitized</i> from the output graph; or in communication networks: the identified edges correspond to <i>vital</i> network connections. We present a series of theoretical and practical results. On the theoretical side, we show through non-trivial reductions that the problems we introduce are NP-hard and, in fact, hard to approximate. For the <i>k</i>-truss based problems, we also show exact exponential-time algorithms, as well as a non-trivial lower bound on the size of an optimal solution. On the practical side, we develop a series of heuristics which are sped up by efficient data structures that we propose for updating the truss or core decomposition under edge deletions. In addition, we develop an algorithm to compute the lower bound. Extensive experiments on 11 real-world and synthetic graphs show that our heuristics are effective, outperforming natural baselines, and also efficient (up to two orders of magnitude faster than a natural baseline) thanks to our data structures. Furthermore, we present a case study on a co-authorship network and experiments showing that the removal of edges identified by our heuristics does not substantially affect the clustering structure of the input graph. </p><p>This work extends a KDD 2021 paper, providing new theoretical results as well as introducing core-based problems and algorithms.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"9 1","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Knowledge Discovery from Data","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3644077","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

We introduce the general problem of identifying a smallest edge subset of a given graph whose deletion makes the graph community-free. We consider this problem under two community notions which have attracted significant attention: k-truss and k-core. We also introduce a problem variant where the identified subset contains edges incident to a given set of nodes and ensures that these nodes are not contained in any community; k-truss or k-core, in our case. These problems are directly applicable in social networks: the identified edges can be hidden by users or sanitized from the output graph; or in communication networks: the identified edges correspond to vital network connections. We present a series of theoretical and practical results. On the theoretical side, we show through non-trivial reductions that the problems we introduce are NP-hard and, in fact, hard to approximate. For the k-truss based problems, we also show exact exponential-time algorithms, as well as a non-trivial lower bound on the size of an optimal solution. On the practical side, we develop a series of heuristics which are sped up by efficient data structures that we propose for updating the truss or core decomposition under edge deletions. In addition, we develop an algorithm to compute the lower bound. Extensive experiments on 11 real-world and synthetic graphs show that our heuristics are effective, outperforming natural baselines, and also efficient (up to two orders of magnitude faster than a natural baseline) thanks to our data structures. Furthermore, we present a case study on a co-authorship network and experiments showing that the removal of edges identified by our heuristics does not substantially affect the clustering structure of the input graph.

This work extends a KDD 2021 paper, providing new theoretical results as well as introducing core-based problems and algorithms.

关于打破桁架式群落和核心式群落
我们提出了一个一般性问题,即找出给定图中最小的边子集,删除该边子集后,该图就不存在群落。我们在两个备受关注的群体概念下考虑这个问题:k-桁架和 k-核心。我们还引入了一个问题变体,即确定的子集包含给定节点集的边,并确保这些节点不包含在任何社群中;在我们的案例中,是 k-truss 或 k-core。这些问题可直接应用于社交网络:已识别的边可被用户隐藏或从输出图中删除;或应用于通信网络:已识别的边对应于重要的网络连接。我们展示了一系列理论和实践成果。在理论方面,我们通过非难性还原证明,我们引入的问题是 NP 难问题,事实上很难近似。对于基于 k 桁架的问题,我们还展示了精确的指数时间算法,以及最优解大小的非难下限。在实际应用方面,我们开发了一系列启发式算法,这些算法通过我们提出的高效数据结构得以加速,用于在边删除的情况下更新桁架或核心分解。此外,我们还开发了一种计算下限的算法。在 11 个真实图和合成图上进行的大量实验表明,我们的启发式方法是有效的,其性能优于自然基线,而且由于我们的数据结构,其效率也很高(比自然基线快两个数量级)。此外,我们还介绍了一个关于共同作者网络的案例研究,实验表明,去除我们的启发式方法识别出的边不会对输入图的聚类结构产生实质性影响。这项工作扩展了 KDD 2021 论文,提供了新的理论结果,并介绍了基于核心的问题和算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, SOFTWARE ENGINEERING
CiteScore
6.70
自引率
5.60%
发文量
172
审稿时长
3 months
期刊介绍: TKDD welcomes papers on a full range of research in the knowledge discovery and analysis of diverse forms of data. Such subjects include, but are not limited to: scalable and effective algorithms for data mining and big data analysis, mining brain networks, mining data streams, mining multi-media data, mining high-dimensional data, mining text, Web, and semi-structured data, mining spatial and temporal data, data mining for community generation, social network analysis, and graph structured data, security and privacy issues in data mining, visual, interactive and online data mining, pre-processing and post-processing for data mining, robust and scalable statistical methods, data mining languages, foundations of data mining, KDD framework and process, and novel applications and infrastructures exploiting data mining technology including massively parallel processing and cloud computing platforms. TKDD encourages papers that explore the above subjects in the context of large distributed networks of computers, parallel or multiprocessing computers, or new data devices. TKDD also encourages papers that describe emerging data mining applications that cannot be satisfied by the current data mining technology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信