Graph sketches: sparsification, spanners, and subgraphs

Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems Pub Date : 2012-05-21 DOI:10.1145/2213556.2213560

K. Ahn, S. Guha, A. Mcgregor

{"title":"Graph sketches: sparsification, spanners, and subgraphs","authors":"K. Ahn, S. Guha, A. Mcgregor","doi":"10.1145/2213556.2213560","DOIUrl":null,"url":null,"abstract":"When processing massive data sets, a core task is to construct synopses of the data. To be useful, a synopsis data structure should be easy to construct while also yielding good approximations of the relevant properties of the data set. A particularly useful class of synopses are sketches, i.e., those based on linear projections of the data. These are applicable in many models including various parallel, stream, and compressed sensing settings. A rich body of analytic and empirical work exists for sketching numerical data such as the frequencies of a set of entities. Our work investigates graph sketching where the graphs of interest encode the relationships between these entities. The main challenge is to capture this richer structure and build the necessary synopses with only linear measurements.\n In this paper we consider properties of graphs including the size of the cuts, the distances between nodes, and the prevalence of dense sub-graphs. Our main result is a sketch-based sparsifier construction: we show that Õ(nε-2) random linear projections of a graph on n nodes suffice to (1+ε) approximate all cut values. Similarly, we show that Õ(ε-2) linear projections suffice for (additively) approximating the fraction of induced sub-graphs that match a given pattern such as a small clique. Finally, for distance estimation we present sketch-based spanner constructions. In this last result the sketches are adaptive, i.e., the linear projections are performed in a small number of batches where each projection may be chosen dependent on the outcome of earlier sketches. All of the above results immediately give rise to data stream algorithms that also apply to dynamic graph streams where edges are both inserted and deleted. The non-adaptive sketches, such as those for sparsification and subgraphs, give us single-pass algorithms for distributed data streams with insertion and deletions. The adaptive sketches can be used to analyze MapReduce algorithms that use a small number of rounds.","PeriodicalId":92118,"journal":{"name":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","volume":"16 1","pages":"5-14"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"296","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2213556.2213560","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 296

Abstract

When processing massive data sets, a core task is to construct synopses of the data. To be useful, a synopsis data structure should be easy to construct while also yielding good approximations of the relevant properties of the data set. A particularly useful class of synopses are sketches, i.e., those based on linear projections of the data. These are applicable in many models including various parallel, stream, and compressed sensing settings. A rich body of analytic and empirical work exists for sketching numerical data such as the frequencies of a set of entities. Our work investigates graph sketching where the graphs of interest encode the relationships between these entities. The main challenge is to capture this richer structure and build the necessary synopses with only linear measurements. In this paper we consider properties of graphs including the size of the cuts, the distances between nodes, and the prevalence of dense sub-graphs. Our main result is a sketch-based sparsifier construction: we show that Õ(nε-2) random linear projections of a graph on n nodes suffice to (1+ε) approximate all cut values. Similarly, we show that Õ(ε-2) linear projections suffice for (additively) approximating the fraction of induced sub-graphs that match a given pattern such as a small clique. Finally, for distance estimation we present sketch-based spanner constructions. In this last result the sketches are adaptive, i.e., the linear projections are performed in a small number of batches where each projection may be chosen dependent on the outcome of earlier sketches. All of the above results immediately give rise to data stream algorithms that also apply to dynamic graph streams where edges are both inserted and deleted. The non-adaptive sketches, such as those for sparsification and subgraphs, give us single-pass algorithms for distributed data streams with insertion and deletions. The adaptive sketches can be used to analyze MapReduce algorithms that use a small number of rounds.

查看原文本刊更多论文

图形草图:稀疏化、扳手和子图

在处理大量数据集时，一个核心任务是构建数据的概要。为了发挥作用，概要数据结构应该易于构建，同时还能很好地近似数据集的相关属性。一类特别有用的概要是草图，即基于数据的线性投影的概要。这些适用于许多模型，包括各种并行，流和压缩传感设置。一个丰富的分析和经验工作体存在于绘制数值数据，如一组实体的频率。我们的工作是研究图形草图，其中感兴趣的图形编码了这些实体之间的关系。主要的挑战是捕获这种更丰富的结构，并仅用线性测量建立必要的概要。在本文中，我们考虑图的性质，包括切的大小，节点之间的距离，以及密集子图的普遍性。我们的主要结果是一个基于草图的稀疏器构造:我们证明了一个图在n个节点上的Õ(nε-2)随机线性投影足以(1+ε)近似所有切割值。类似地，我们证明Õ(ε-2)线性投影足以(加性地)逼近与给定模式(如小团)匹配的诱导子图的分数。最后，对于距离估计，我们提出了基于草图的扳手结构。在最后一个结果中，草图是自适应的，即，线性投影在少量批次中执行，其中每个投影可以根据早期草图的结果选择。所有上述结果立即产生了数据流算法，也适用于动态图流，其中边被插入和删除。非自适应草图，例如那些用于稀疏化和子图的草图，为我们提供了带有插入和删除的分布式数据流的单遍算法。自适应草图可用于分析使用少量轮数的MapReduce算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems

CiteScore

4.40

自引率

0.00%

发文量