Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data最新文献_第10页

Unified Spatial Analytics from Heterogeneous Sources with Amazon Redshift 基于Amazon红移的异构数据源的统一空间分析

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data Pub Date : 2020-05-29 DOI: 10.1145/3318464.3384704

Nemanja Borić, Hinnerk Gildhoff, M. Karavelas, I. Pandis, Ioanna Tsalouchidou

引用次数: 6

Truss-based Community Search over Large Directed Graphs 大型有向图上基于桁架的社区搜索

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data Pub Date : 2020-05-29 DOI: 10.1145/3318464.3380587

Qing Liu, Minjun Zhao, Xin Huang, Jianliang Xu, Yunjun Gao

{"title":"Truss-based Community Search over Large Directed Graphs","authors":"Qing Liu, Minjun Zhao, Xin Huang, Jianliang Xu, Yunjun Gao","doi":"10.1145/3318464.3380587","DOIUrl":"https://doi.org/10.1145/3318464.3380587","url":null,"abstract":"Community search enables personalized community discovery and has wide applications in large real-world graphs. While community search has been extensively studied for undirected graphs, the problem for directed graphs has received attention only recently. However, existing studies suffer from several drawbacks, e.g., the vertices with varied in-degrees and out-degrees cannot be included in a community at the same time. To address the limitations, in this paper, we systematically study the problem of community search over large directed graphs. We start by presenting a novel community model, called D-truss, based on two distinct types of directed triangles, i.e., flow triangle and cycle triangle. The D-truss model brings nice structural and computational properties and has many advantages in comparison with the existing models. With this new model, we then formulate the D-truss community search problem, which is proved to be NP-hard. In view of its hardness, we propose two efficient 2-approximation algorithms, named Global and Local, that run in polynomial time yet with quality guarantee. To further improve the efficiency of the algorithms, we devise an indexing method based on D-truss decomposition. Consequently, the D-truss community search can be solved upon the D-truss index without time-consuming accesses to the original graph. Experimental studies on real-world graphs with ground-truth communities validate the quality of the solutions we obtain and the efficiency of the proposed algorithms.","PeriodicalId":436122,"journal":{"name":"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124703863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 50

Aggregation Support for Modern Graph Analytics in TigerGraph 对TigerGraph中现代图形分析的聚合支持

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data Pub Date : 2020-05-29 DOI: 10.1145/3318464.3386144

Alin Deutsch, Yu Xu, Mingxi Wu, Victor E. Lee

引用次数: 21

PROUD: PaRallel OUtlier Detection for Streams 骄傲:并行异常检测流

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data Pub Date : 2020-05-29 DOI: 10.1145/3318464.3384688

Theodoros Toliopoulos, Christos Bellas, A. Gounaris, A. Papadopoulos

引用次数: 10

Aggify: Lifting the Curse of Cursor Loops using Custom Aggregates Aggify:使用自定义聚合解除游标循环的诅咒

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data Pub Date : 2020-05-29 DOI: 10.1145/3318464.3389736

Surabhi Gupta, S. Purandare, Karthik Ramachandra

{"title":"Aggify: Lifting the Curse of Cursor Loops using Custom Aggregates","authors":"Surabhi Gupta, S. Purandare, Karthik Ramachandra","doi":"10.1145/3318464.3389736","DOIUrl":"https://doi.org/10.1145/3318464.3389736","url":null,"abstract":"Loops that iterate over SQL query results are quite common, both in application programs that run outside the DBMS, as well as User Defined Functions (UDFs) and stored procedures that run within the DBMS. It can be argued that set-oriented operations are more efficient and should be preferred over iteration; but from real world use cases, it is clear that loops over query results are inevitable in many situations, and are preferred by many users. Such loops, known as cursor loops, come with huge trade-offs and overheads w.r.t. performance, resource consumption and concurrency. We present Aggify, a technique for optimizing loops over query results that overcomes these overheads. It achieves this by automatically generating custom aggregates that are equivalent in semantics to the loop. Thereby, Aggify completely eliminates the loop by rewriting the query to use this generated aggregate. This technique has several advantages such as: (i) pipelining of entire cursor loop operations instead of materialization, (ii) pushing down loop computation from the application layer into the DBMS, closer to the data, (iii) leveraging existing work on optimization of aggregate functions, resulting in efficient query plans. We describe the technique underlying Aggify, and present our experimental evaluation over benchmarks as well as real workloads that demonstrate the significant benefits of this technique.","PeriodicalId":436122,"journal":{"name":"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129855846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Automating Exploratory Data Analysis via Machine Learning: An Overview 通过机器学习自动化探索性数据分析:概述

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data Pub Date : 2020-05-29 DOI: 10.1145/3318464.3383126

T. Milo, Amit Somech

{"title":"Automating Exploratory Data Analysis via Machine Learning: An Overview","authors":"T. Milo, Amit Somech","doi":"10.1145/3318464.3383126","DOIUrl":"https://doi.org/10.1145/3318464.3383126","url":null,"abstract":"Exploratory Data Analysis (EDA) is an important initial step for any knowledge discovery process, in which data scientists interactively explore unfamiliar datasets by issuing a sequence of analysis operations (e.g. filter, aggregation, and visualization). Since EDA is long known as a difficult task, requiring profound analytical skills, experience, and domain knowledge, a plethora of systems have been devised over the last decade in order to facilitate EDA. In particular, advancements in machine learning research have created exciting opportunities, not only for better facilitating EDA, but to fully automate the process. In this tutorial, we review recent lines of work for automating EDA. Starting from recommender systems for suggesting a single exploratory action, going through kNN-based classifiers and active-learning methods for predicting users' interestingness preferences, and finally to fully automating EDA using state-of-the-art methods such as deep reinforcement learning and sequence-to-sequence models. We conclude the tutorial with a discussion on the main challenges and open questions to be dealt with in order to ultimately reduce the manual effort required for EDA.","PeriodicalId":436122,"journal":{"name":"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128898973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 45

Hub Labeling for Shortest Path Counting 最短路径计数的集线器标记

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data Pub Date : 2020-05-29 DOI: 10.1145/3318464.3389737

Yikai Zhang, J. Yu

引用次数: 9

The Challenge of Building Effective, Enterprise-scale Data Lakes 构建有效的企业级数据湖的挑战

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data Pub Date : 2020-05-29 DOI: 10.1145/3318464.3393816

Awez Syed

引用次数: 2

Long-lived Transactions Made Less Harmful 长期交易的危害更小

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data Pub Date : 2020-05-29 DOI: 10.1145/3318464.3389714

Jong-Bin Kim, H. Cho, Kihwang Kim, Jaeseon Yu, Sooyong Kang, Hyungsoo Jung

{"title":"Long-lived Transactions Made Less Harmful","authors":"Jong-Bin Kim, H. Cho, Kihwang Kim, Jaeseon Yu, Sooyong Kang, Hyungsoo Jung","doi":"10.1145/3318464.3389714","DOIUrl":"https://doi.org/10.1145/3318464.3389714","url":null,"abstract":"Many systems use snapshot isolation, or something similar, as defaults, and multi-version concurrency control (MVCC) remains essential to offering such point-in-time consistency. One major issue in MVCC is the timely removal of unnecessary versions of data items, especially in the presence of long-lived transactions (LLTs). We have observed that the latest versions of MySQL and PostgreSQL are still vulnerable to LLTs. Our analysis of existing proposals suggests that new solutions to this matter must provide rigorous rules for completely identifying unnecessary versions, and elaborate designs for version cleaning lest old versions required for LLTs should suspend garbage collection. In this paper, we formalize such rules into our version pruning theorem and version classification, of which all form theoretical foundations for our new version management system, vDriver, that bases its record versioning on a new principle: Single In-row Remaining Off-row (SIRO) versioning. We implemented a prototype of vDriver and integrated it with MySQL-8.0 and PostgreSQL-12.0. The experimental evaluation demonstrated that the engines with Driver continue to perform the reclamation of dead versions in the face of LLTs while retaining transaction throughput with reduced space consumption.","PeriodicalId":436122,"journal":{"name":"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120983381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

A Method for Optimizing Opaque Filter Queries 一种优化不透明筛选查询的方法

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data Pub Date : 2020-05-29 DOI: 10.1145/3318464.3389766

Wenjia He, Michael R. Anderson, M. Strome, Michael J. Cafarella

{"title":"A Method for Optimizing Opaque Filter Queries","authors":"Wenjia He, Michael R. Anderson, M. Strome, Michael J. Cafarella","doi":"10.1145/3318464.3389766","DOIUrl":"https://doi.org/10.1145/3318464.3389766","url":null,"abstract":"An important class of database queries in machine learning and data science workloads is the opaque filter query: a query with a selection predicate that is implemented with a UDF, with semantics that are unknown to the query optimizer. Some typical examples would include a CNN-style trained image classifier, or a textual sentiment classifier. Because the optimizer does not know the predicate's semantics, it cannot employ standard optimizations, yielding long query times. We propose voodoo indexing, a two-phase method for optimizing opaque filter queries. Before any query arrives, the method builds a hierarchical \"query-independent\" index of the database contents, which groups together similar objects. At query-time, the method builds a map of how much each group satisfies the predicate, while also exploiting the map to accelerate execution. Unlike past methods, voodoo indexing does not require insight into predicate semantics, works on any data type, and does not require in-query model training. We describe both standalone and SparkSQL-specific implementations, plus experiments on both image and text data, on more than 100 distinct opaque predicates. We show voodoo indexing can yield up to an 88% improvement over standard scan behavior, and a 79% improvement over the previous best method adapted from research literature.","PeriodicalId":436122,"journal":{"name":"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115111438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8