Proceedings of the 2016 International Conference on Management of Data最新文献

筛选
英文 中文
Distributed Set Reachability 分布式集可达性
Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-14 DOI: 10.1145/2882903.2915226
Sairam Gurajada, M. Theobald
{"title":"Distributed Set Reachability","authors":"Sairam Gurajada, M. Theobald","doi":"10.1145/2882903.2915226","DOIUrl":"https://doi.org/10.1145/2882903.2915226","url":null,"abstract":"In this paper, we focus on the efficient and scalable processing of set-reachability queries over a distributed, directed data graph. A \"set-reachability query\" is a generalized form of a reachability query, in which we consider two sets S and T of source and target vertices, respectively, to be given as the query. The result of a set-reachability query are all pairs of source and target vertices (s, t), with s -- S and t #8712; T, where s is reachable to t (denoted as S ↝ T). In case the data graph is partitioned into multiple, edge- and vertex-disjoint subgraphs (e.g., when distributed across multiple compute nodes in a cluster), we refer to the resulting set-reachability problem as \"distributed set reachability\". The key goal in processing a distributed set-reachability query over a partitioned data graph both efficiently and in a scalable manner is (1) to avoid redundant computations within the local compute nodes as much as possible, (2) to partially evaluate the local components of a set-reachability query S ↝ T among all compute nodes in parallel, and (3) to minimize both the size and number of messages exchanged among the compute nodes. Distributed set reachability has a plethora of applications in graph analytics and for query processing. The current W3C recommendation for SPARQL 1.1, for example, introduces a notion of \"labeled property paths\" which resolves to processing a form of generalized graph-pattern queries with set-reachability predicates. Moreover, analyzing dependencies among \"social-network communities\" inherently involves reachability checks between large sets of source and target vertices. Our experiments confirm very significant performance gains of our approach in comparison to state-of-the-art graph engines such as Giraph++, and over a variety of graph collections with up to 1.4 billion edges.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82330803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Sharing-Aware Outlier Analytics over High-Volume Data Streams 大容量数据流上的共享感知异常值分析
Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-14 DOI: 10.1145/2882903.2882920
Lei Cao, Jiayuan Wang, Elke A. Rundensteiner
{"title":"Sharing-Aware Outlier Analytics over High-Volume Data Streams","authors":"Lei Cao, Jiayuan Wang, Elke A. Rundensteiner","doi":"10.1145/2882903.2882920","DOIUrl":"https://doi.org/10.1145/2882903.2882920","url":null,"abstract":"Real-time analytics of anomalous phenomena on streaming data typically relies on processing a large variety of continuous outlier detection requests, each configured with different parameter settings. The processing of such complex outlier analytics workloads is resource consuming due to the algorithmic complexity of the outlier mining process. In this work we propose a sharing-aware multi-query execution strategy for outlier detection on data streams called SOP. A key insight of SOP is to transform the problem of handling a multi-query outlier analytics workload into a single-query skyline computation problem. We prove that the output of the skyline computation process corresponds to the minimal information needed for determining the outlier status of any point in the stream. Based on this new formulation, we design a customized skyline algorithm called K-SKY that leverages the domination relationships among the streaming data points to minimize the number of data points that must be evaluated for supporting multi-query outlier detection. Based on this K-SKY algorithm, our SOP solution achieves minimal utilization of both computational and memory resources for the processing of these complex outlier analytics workload. Our experimental study demonstrates that SOP consistently outperforms the state-of-art solutions by three orders of magnitude in CPU time, while only consuming 5% of their memory footprint - a clear win-win. Furthermore, SOP is shown to scale to large workloads composed of thousands of parameterized queries.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86695054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Publishing Attributed Social Graphs with Formal Privacy Guarantees 发布具有正式隐私保证的属性社交图
Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-14 DOI: 10.1145/2882903.2915215
Zach Jorgensen, Ting Yu, Graham Cormode
{"title":"Publishing Attributed Social Graphs with Formal Privacy Guarantees","authors":"Zach Jorgensen, Ting Yu, Graham Cormode","doi":"10.1145/2882903.2915215","DOIUrl":"https://doi.org/10.1145/2882903.2915215","url":null,"abstract":"Many data analysis tasks rely on the abstraction of a graph to represent relations between entities, with attributes on the nodes and edges. Since the relationships encoded are often sensitive, we seek effective ways to release representative graphs which nevertheless protect the privacy of the data subjects. Prior work on this topic has focused primarily on the graph structure in isolation, and has not provided ways to handle richer graphs with correlated attributes. We introduce an approach to release such graphs under the strong guarantee of differential privacy. We adapt existing graph models, and introduce a new one, and show how to augment them with meaningful privacy. This provides a complete workflow, where the input is a sensitive graph, and the output is a realistic synthetic graph. Our experimental study demonstrates that our process produces useful, accurate attributed graphs.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86275621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 63
FPTree: A Hybrid SCM-DRAM Persistent and Concurrent B-Tree for Storage Class Memory FPTree:一种用于存储类内存的混合SCM-DRAM持久和并发b树
Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-14 DOI: 10.1145/2882903.2915251
Ismail Oukid, Johan Lasperas, A. Nica, Thomas Willhalm, Wolfgang Lehner
{"title":"FPTree: A Hybrid SCM-DRAM Persistent and Concurrent B-Tree for Storage Class Memory","authors":"Ismail Oukid, Johan Lasperas, A. Nica, Thomas Willhalm, Wolfgang Lehner","doi":"10.1145/2882903.2915251","DOIUrl":"https://doi.org/10.1145/2882903.2915251","url":null,"abstract":"The advent of Storage Class Memory (SCM) is driving a rethink of storage systems towards a single-level architecture where memory and storage are merged. In this context, several works have investigated how to design persistent trees in SCM as a fundamental building block for these novel systems. However, these trees are significantly slower than DRAM-based counterparts since trees are latency-sensitive and SCM exhibits higher latencies than DRAM. In this paper we propose a novel hybrid SCM-DRAM persistent and concurrent B-Tree, named Fingerprinting Persistent Tree (FPTree) that achieves similar performance to DRAM-based counterparts. In this novel design, leaf nodes are persisted in SCM while inner nodes are placed in DRAM and rebuilt upon recovery. The FPTree uses Fingerprinting, a technique that limits the expected number of in-leaf probed keys to one. In addition, we propose a hybrid concurrency scheme for the FPTree that is partially based on Hardware Transactional Memory. We conduct a thorough performance evaluation and show that the FPTree outperforms state-of-the-art persistent trees with different SCM latencies by up to a factor of 8.2. Moreover, we show that the FPTree scales very well on a machine with 88 logical cores. Finally, we integrate the evaluated trees in memcached and a prototype database. We show that the FPTree incurs an almost negligible performance overhead over using fully transient data structures, while significantly outperforming other persistent trees.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89903421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 291
SparkR: Scaling R Programs with Spark SparkR:用Spark扩展R程序
Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-14 DOI: 10.1145/2882903.2903740
S. Venkataraman, Zongheng Yang, Davies Liu, Eric Liang, H. Falaki, Xiangrui Meng, Reynold Xin, A. Ghodsi, M. Franklin, I. Stoica, M. Zaharia
{"title":"SparkR: Scaling R Programs with Spark","authors":"S. Venkataraman, Zongheng Yang, Davies Liu, Eric Liang, H. Falaki, Xiangrui Meng, Reynold Xin, A. Ghodsi, M. Franklin, I. Stoica, M. Zaharia","doi":"10.1145/2882903.2903740","DOIUrl":"https://doi.org/10.1145/2882903.2903740","url":null,"abstract":"R is a popular statistical programming language with a number of extensions that support data processing and machine learning tasks. However, interactive data analysis in R is usually limited as the R runtime is single threaded and can only process data sets that fit in a single machine's memory. We present SparkR, an R package that provides a frontend to Apache Spark and uses Spark's distributed computation engine to enable large scale data analysis from the R shell. We describe the main design goals of SparkR, discuss how the high-level DataFrame API enables scalable computation and present some of the key details of our implementation.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"70 8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79213287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
Robust and Noise Resistant Wrapper Induction 坚固和抗噪声封装感应
Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-14 DOI: 10.1145/2882903.2915214
Tim Furche, Jinsong Guo, S. Maneth, C. Schallhart
{"title":"Robust and Noise Resistant Wrapper Induction","authors":"Tim Furche, Jinsong Guo, S. Maneth, C. Schallhart","doi":"10.1145/2882903.2915214","DOIUrl":"https://doi.org/10.1145/2882903.2915214","url":null,"abstract":"Wrapper induction is the problem of automatically inferring a query from annotated web pages of the same template. This query should not only select the annotated content accurately but also other content following the same template. Beyond accurately matching the template, we consider two additional requirements: (1) wrappers should be robust against a large class of changes to the web pages, and (2) the induction process should be noise resistant, i.e., tolerate slightly erroneous (e.g., machine generated) samples. Key to our approach is a query language that is powerful enough to permit accurate selection, but limited enough to force noisy samples to be generalized into wrappers that select the likely intended items. We introduce such a language as subset of XPATH and show that even for such a restricted language, inducing optimal queries according to a suitable scoring is infeasible. Nevertheless, our wrapper induction framework infers highly robust and noise resistant queries. We evaluate the queries on snapshots from web pages that change over time as provided by the Internet Archive, and show that the induced queries are as robust as the human-made queries. The queries often survive hundreds sometimes thousands of days, with many changes to the relative position of the selected nodes (including changes on template level). This is due to the few and discriminative anchor (intermediately selected) nodes of the generated queries. The queries are highly resistant against positive noise (up to 50%) and negative noise (up to 20%).","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"87 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73623955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Closing the functional and Performance Gap between SQL and NoSQL 缩小SQL和NoSQL在功能和性能上的差距
Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-14 DOI: 10.1145/2882903.2903731
Z. Liu, B. Hammerschmidt, Douglas Mcmahon, Y. Liu, Hui J. Chang
{"title":"Closing the functional and Performance Gap between SQL and NoSQL","authors":"Z. Liu, B. Hammerschmidt, Douglas Mcmahon, Y. Liu, Hui J. Chang","doi":"10.1145/2882903.2903731","DOIUrl":"https://doi.org/10.1145/2882903.2903731","url":null,"abstract":"Oracle release 12cR1 supports JSON data management that enables users to store, index and query JSON data along with relational data. The integration of the JSON data model into the RDBMS allows a new paradigm of data management where data is storable, indexable and queryable without upfront schema definition. We call this new paradigm Flexible Schema Data Management (FSDM). In this paper, we present enhancements to Oracle's JSON data management in the upcoming 12cR2 release. We present JSON DataGuide, an auto-computed dynamic soft schema for JSON collections that closes the functional gap between the fixed-schema SQL world and the schema-less NoSQL world. We present a self-contained query friendly binary format for encoding JSON (OSON) to close the query performance gap between schema-encoded relational data and schema free JSON textual data. The addition of these new features makes the Oracle RDBMS well suited to both fixedschema SQL and flexible-schema NoSQL use cases, and allows users to freely mix the two paradigms in a single data management system.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"349 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75490597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Efficient Subgraph Matching by Postponing Cartesian Products 延迟笛卡尔积的高效子图匹配
Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-14 DOI: 10.1145/2882903.2915236
Fei Bi, Lijun Chang, Xuemin Lin, Lu Qin, W. Zhang
{"title":"Efficient Subgraph Matching by Postponing Cartesian Products","authors":"Fei Bi, Lijun Chang, Xuemin Lin, Lu Qin, W. Zhang","doi":"10.1145/2882903.2915236","DOIUrl":"https://doi.org/10.1145/2882903.2915236","url":null,"abstract":"In this paper, we study the problem of subgraph matching that extracts all subgraph isomorphic embeddings of a query graph q in a large data graph G. The existing algorithms for subgraph matching follow Ullmann's backtracking approach; that is, iteratively map query vertices to data vertices by following a matching order of query vertices. It has been shown that the matching order of query vertices is a very important aspect to the efficiency of a subgraph matching algorithm. Recently, many advanced techniques, such as enforcing connectivity and merging similar vertices in query or data graphs, have been proposed to provide an effective matching order with the aim to reduce unpromising intermediate results especially the ones caused by redundant Cartesian products. In this paper, for the first time we address the issue of unpromising results by Cartesian products from \"dissimilar\" vertices. We propose a new framework by postponing the Cartesian products based on the structure of a query to minimize the redundant Cartesian products. Our second contribution is proposing a new path-based auxiliary data structure, with the size O(|E(G)| x |V(q)|), to generate a matching order and conduct subgraph matching, which significantly reduces the exponential size O(|V(G)||V(q)|-1) of the existing path-based auxiliary data structure, where V (G) and E (G) are the vertex and edge sets of a data graph G, respectively, and V (q) is the vertex set of a query $q$. Extensive empirical studies on real and synthetic graphs demonstrate that our techniques outperform the state-of-the-art algorithms by up to $3$ orders of magnitude.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"516 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77120933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 205
Privacy Preserving Subgraph Matching on Large Graphs in Cloud 云中大图的保密性子图匹配
Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-14 DOI: 10.1145/2882903.2882956
Zhao Chang, Lei Zou, Feifei Li
{"title":"Privacy Preserving Subgraph Matching on Large Graphs in Cloud","authors":"Zhao Chang, Lei Zou, Feifei Li","doi":"10.1145/2882903.2882956","DOIUrl":"https://doi.org/10.1145/2882903.2882956","url":null,"abstract":"The wide presence of large graph data and the increasing popularity of storing data in the cloud drive the needs for graph query processing on a remote cloud. But a fundamental challenge is to process user queries without compromising sensitive information. This work focuses on privacy preserving subgraph matching in a cloud server. The goal is to minimize the overhead on both cloud and client sides for subgraph matching, without compromising users' sensitive information. To that end, we transform an original graph $G$ into a privacy preserving graph Gk, which meets the requirement of an existing privacy model known as k-automorphism. By making use of the symmetry in a k-automorphic graph, a subgraph matching query can be efficiently answered using a graph Go, a small subset of Gk. This approach saves both space and query cost in the cloud server. We also anonymize the query graphs to protect their label information using label generalization technique. To reduce the search space for a subgraph matching query, we propose a cost model to select the more effective label combinations. The effectiveness and efficiency of our method are demonstrated through extensive experimental results on real datasets.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88812737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Learning Linear Regression Models over Factorized Joins 学习因式连接上的线性回归模型
Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-14 DOI: 10.1145/2882903.2882939
Maximilian Schleich, Dan Olteanu, Radu Ciucanu
{"title":"Learning Linear Regression Models over Factorized Joins","authors":"Maximilian Schleich, Dan Olteanu, Radu Ciucanu","doi":"10.1145/2882903.2882939","DOIUrl":"https://doi.org/10.1145/2882903.2882939","url":null,"abstract":"We investigate the problem of building least squares regression models over training datasets defined by arbitrary join queries on database tables. Our key observation is that joins entail a high degree of redundancy in both computation and data representation, which is not required for the end-to-end solution to learning over joins. We propose a new paradigm for computing batch gradient descent that exploits the factorized computation and representation of the training datasets, a rewriting of the regression objective function that decouples the computation of cofactors of model parameters from their convergence, and the commutativity of cofactor computation with relational union and projection. We introduce three flavors of this approach: F/FDB computes the cofactors in one pass over the materialized factorized join; Favoids this materialization and intermixes cofactor and join computation; F/SQL expresses this mixture as one SQL query. Our approach has the complexity of join factorization, which can be exponentially lower than of standard joins. Experiments with commercial, public, and synthetic datasets show that it outperforms MADlib, Python StatsModels, and R, by up to three orders of magnitude.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90280606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 164
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信