SIGMOD Rec.最新文献_第7页

Optimizing Tree Patterns for Querying Graph- and Tree-Structured Data 优化树模式查询图和树结构数据

SIGMOD Rec. Pub Date : 2017-05-12 DOI: 10.1145/3093754.3093759

Wojciech Czerwinski, W. Martens, Matthias Niewerth, P. Parys

引用次数: 7

Technical Perspective: Reflections on Extending SQL using Constraints 技术视角:关于使用约束扩展SQL的思考

SIGMOD Rec. Pub Date : 2017-05-12 DOI: 10.1145/3093754.3093760

S. Chaudhuri

引用次数: 0

A Scalable Execution Engine for Package Queries 包查询的可伸缩执行引擎

SIGMOD Rec. Pub Date : 2017-05-12 DOI: 10.1145/3093754.3093761

Matteo Brucato, A. Abouzeid, A. Meliou

{"title":"A Scalable Execution Engine for Package Queries","authors":"Matteo Brucato, A. Abouzeid, A. Meliou","doi":"10.1145/3093754.3093761","DOIUrl":"https://doi.org/10.1145/3093754.3093761","url":null,"abstract":"Many modern applications and real-world problems involve the design of item collections, or packages: from planning your daily meals all the way to mapping the universe. Despite the pervasive need for packages, traditional data management does not offer support for their definition and computation. This is because traditional database queries follow a powerful, but very simple model: a query defines constraints that each tuple in the result must satisfy. However, a system tasked with the design of packages cannot consider items independently; rather, the system needs to determine if a set of items collectively satisfy given criteria.\u0000 In this paper, we present package queries, a new query model that extends traditional database queries to handle complex constraints and preferences over answer sets. We develop a full-fledged package query system, implemented on top of a traditional database engine. Our work makes several contributions. First, we design PaQL, a SQL-based query language that supports the declarative specification of package queries. Second, we present a fundamental strategy for evaluating package queries that combines the capabilities of databases and constraint optimization solvers. The core of our approach is a set of translation rules that transform a package query to an integer linear program. Third, we introduce an offline data partitioning strategy allowing query evaluation to scale to large data sizes. Fourth, we introduce SKETCHREFINE, an efficient and scalable algorithm for package evaluation, which offers strong approximation guarantees. Finally, we present extensive experiments over real-world data. Our results demonstrate that SKETCHREFINE is effective at deriving high-quality package results, and achieves runtime performance that is an order of magnitude faster than directly using ILP solvers over large datasets.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"20 1","pages":"24-31"},"PeriodicalIF":0.0,"publicationDate":"2017-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83169806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Technical Perspective: Scaling Machine Learning via Compressed Linear Algebra 技术视角:通过压缩线性代数扩展机器学习

SIGMOD Rec. Pub Date : 2017-05-12 DOI: 10.1145/3093754.3093764

Z. Ives

{"title":"Technical Perspective: Scaling Machine Learning via Compressed Linear Algebra","authors":"Z. Ives","doi":"10.1145/3093754.3093764","DOIUrl":"https://doi.org/10.1145/3093754.3093764","url":null,"abstract":"Demand for more powerful “big data analytics” solutions has spurred a great deal of interest in the core programming models, abstractions, and platforms for next-generation systems. For these problems, a complete solution would address data wrangling and processing, and support analytics over data of any modality or scale. It would support a wide array of machine learning algorithms, but also provide primitives for building new ones. It should be customizable, scale to vast volumes of data, and map to modern multicore, GPU, co-processor, and compute cluster hardware. In pursuit of these goals, novel techniques and solutions are being developed by machine learning researchers (e.g., high-performance libraries like Theano [6], runtime systems like GraphLab [5]), in the database and distributed systems research communities (e.g., distributed data analytics engines like Spark [7] and Flink [3]), and in industry by major technology players (e.g., Google’s TensorFlow [1] and IBM/Apache’s SystemML [4]). These libraries and platforms support multiple development languages, provide abstract datatypes for machine learning over data, and include compilers and runtime systems optimized for distributed execution on modern hardware. The database community excels in developing techniques for cost-estimating and optimizing declarative programs, and in exploiting data independence to optimize data placement and layout for performance. Elgohary et al’s work on “Scaling Machine Learning via Compressed Linear Algebra,”which appeared in the Proceedings of the VLDB Endowment [2], was conducted within IBM and Apache’s SystemML declarative machine learning project. It shows just how e↵ective such database techniques can be in a machine learning setting. The authors observe that the core data objects in machine learning – feature matrices, weight vectors – tend to have repeated values as well as regular structure, and may be quite large. Machine learning tasks over such data are composed from lower-level linear algebra operations. Such operations generally involve repeated floating-point computation that today are bandwidth-limited, by the ability of the CPU to traverse large matrices in RAM. The authors’ solution is to develop a compressed representation for matrices, as well as compressed linear algebra operations that work directly over the compressed matrix data. Together, these reduce the bandwidth required to perform the same computations, thus speeding performance dramatically. The paper cleverly adapts ideas first developed in relational database systems — column-oriented compression, sampling-based cost estimation, trading between compression speed and compression rate — to build an elegant implementation. The paper makes a number of key contributions. First, the authors identify a set of linear algebra primitives shared by multiple distributed machine learning platforms and algorithms. Second, they develop compression techniques not only for single columns in a","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"69 1","pages":"41"},"PeriodicalIF":0.0,"publicationDate":"2017-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83974143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Updates to the TODS Editorial Board TODS编辑委员会的最新情况

SIGMOD Rec. Pub Date : 2017-05-12 DOI: 10.1145/3093754.3093767

Christian S. Jensen

{"title":"Updates to the TODS Editorial Board","authors":"Christian S. Jensen","doi":"10.1145/3093754.3093767","DOIUrl":"https://doi.org/10.1145/3093754.3093767","url":null,"abstract":"It is of paramount importance for a scholarly journal such as ACM Transactions on Database Systems to have a strong editorial board of respected, world-class scholars. The editorial board plays a fundamental role in attracting the best submissions, in ensuring insightful and timely handling of submissions, in maintaining the high scientific standards of the journal, and in maintaining the reputation of the journal. Indeed, the journal’s associate editors, along with the reviewers and authors they work with, are the primary reason that TODS is a world-class journal. As of January 1, 2017, three Associate Editors—Divyakant Agrawal, Sihem Amer-Yahia, and Paolo Ciaccia—ended their terms, each having served on the editorial board for roughly six years. In addition, they will stay on until they complete their current loads. Paolo, Divy, and Sihem have provided very substantial, high-caliber service to the journal and the database community. Specifically, they have lent their extensive experience, deep insight, and sound technical judgment to the journal. I have never seen them compromise on quality when handling submissions. Surely, they have had many other demands on their time, many of which are better paid, during these past six years. We are all fortunate that they have donated their time and unique expertise to the journal and our community during half a dozen years. They deserve our recognition for their commitment to the scientific enterprise. Also as of January 1, 2017, three new Associate Editors joined the editorial board:","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"70 1","pages":"50"},"PeriodicalIF":0.0,"publicationDate":"2017-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73717430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Technical Perspective: Juggling Functions Inside a Database 技术角度:处理数据库中的函数

SIGMOD Rec. Pub Date : 2017-05-12 DOI: 10.1145/3093754.3093756

Dan Olteanu

{"title":"Technical Perspective: Juggling Functions Inside a Database","authors":"Dan Olteanu","doi":"10.1145/3093754.3093756","DOIUrl":"https://doi.org/10.1145/3093754.3093756","url":null,"abstract":"The paper entitled ”Juggling Functions Inside a Database” gives a brief overview of FAQ, a framework for computational problems expressed as Functional Aggregate Queries. This work falls into my bucket of select database research contributions that go significantly beyond the state of the art along several dimensions. First, it provides an elegant and declarative formalism for a host of ubiquituous computational problems across Computer Science and at the right level of abstraction that exposes structural properties of the problem instances and allows for fine-grained complexity analysis. Second, it is technically deep, proposing an algorithmic solution that achieves lower than or the same complexity as specialized approaches in their respective domain. Third, it is implemented in a commercial database system with scores of real-world applications. Fourth, it is currently applied to in-database analytics and I expect more applications will manifest themselves in the near future. By unifying many problems under the same formalism, FAQ bears the promise of accelerating research: Scalable data management solutions developed by our community for aggregates over joins, e.g., incremental view maintenance, index data structures, or distributed processing, may become generalpurpose solutions for problems outside databases. I will next expand on some of its contributions.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"50 1","pages":"5"},"PeriodicalIF":0.0,"publicationDate":"2017-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89290855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Wander Join and XDB: Online Aggregation via Random Walks Wander Join和XDB:通过随机漫步进行在线聚合

SIGMOD Rec. Pub Date : 2017-05-12 DOI: 10.1145/3093754.3093763

Feifei Li, Bin Wu, K. Yi, Zhuoyue Zhao

引用次数: 29

Technical Perspective: Optimizing Tree Patterns for Querying Graph- and Tree-Structured Data 技术视角:优化查询图和树结构数据的树模式

SIGMOD Rec. Pub Date : 2017-05-12 DOI: 10.1145/3093754.3093758

B. Kimelfeld

引用次数: 0

Report on the First International Workshop on Reproducible Open Science 第一届可再生开放科学国际研讨会报告

SIGMOD Rec. Pub Date : 2017-05-11 DOI: 10.1145/3092931.3092942

P. Manghi, Jochen Schirrwagen, Óscar Corcho, Amir Aryani

{"title":"Report on the First International Workshop on Reproducible Open Science","authors":"P. Manghi, Jochen Schirrwagen, Óscar Corcho, Amir Aryani","doi":"10.1145/3092931.3092942","DOIUrl":"https://doi.org/10.1145/3092931.3092942","url":null,"abstract":"In the last decade, information and communication technology (ICT) advances have deeply affected the scientific process, which increasingly produces and relies on digital research products, such as publications, datasets, experiments, websites, software, blogs, etc. Accordingly, scientific communication has started mutating in order to adapt its mission (and business models) to such new scientific paradigms and benefit from the unprecedented Open Science opportunities that may arise from them: reproducibility, i.e., the ability of repeating a digital experiment and reusing its constituent products; and transparent evaluation, i.e., the ability of (i) effectively evaluating scientific experiments by means of reproducibility and (ii) assigning fine-grained scientific reward, based on effective authorship across the overall scientific process. Scientists, research institutions, and funders are pushing for innovative Open Science scholarly communication workflows (i.e., submission, peer-review, access, reuse, citation, and scientific reward), marrying a holistic approach where publishing includes in principle any digital product resulting from a research activity that is relevant to the evaluation and reproducibility of the activity or part of it. Defining, taking up, and supporting Open Science publishing workflows become urgent challenges, to be addressed by ICT solutions capable of fostering and driving radical changes in the way science is developed and disseminated. The goal of the first International Workshop on Reproducible Open Science1 was to provide a forum for constructively exploring foundational, orga-","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"29 1","pages":"49-52"},"PeriodicalIF":0.0,"publicationDate":"2017-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89490003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Guide to Formal Analysis of Join Processing in Massively Parallel Systems 大规模并行系统中联接处理的形式化分析指南

SIGMOD Rec. Pub Date : 2017-05-11 DOI: 10.1145/3092931.3092934

Paraschos Koutris, Dan Suciu

{"title":"A Guide to Formal Analysis of Join Processing in Massively Parallel Systems","authors":"Paraschos Koutris, Dan Suciu","doi":"10.1145/3092931.3092934","DOIUrl":"https://doi.org/10.1145/3092931.3092934","url":null,"abstract":"Over the last decade, there has been an enormous increase in the volume of data that is being stored, processed and analyzed. In order to improve the performance of query processing on such amounts of data, many modern data management systems (e.g. Spark [23, 28], Hadoop [13, 9, 24], and others [19, 14]) have resorted to the power of parallelism to speed up computation. Parallelism enables the distribution of computation for data-intensive tasks into hundreds, or even thousands of machines, and thus significantly reduces the completion time for several crucial data processing tasks. In this paper, we present a survey on recent results [18, 4, 5, 17] that study the computational complexity of mulitway join processing in such massively parallel systems. Our goal is twofold. First, we introduce a simple theoretical model, called the MPC (Massively Parallel Computation) model, that allows us to rigorously analyze the computational complexity of various parallel algorithms for query processing. Second, using the MPC model as a theoretical tool, we show how we can design novel algorithms and techniques for multiway join processing, and how we can prove their optimality through tight lower bounds. Our analysis provides a deeper understanding of how much synchronization, communication and data load is required when we compute a multiway join query, and informs of what is possible to achieve under specific system constraints.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"24 1","pages":"18-27"},"PeriodicalIF":0.0,"publicationDate":"2017-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90755633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9