SIGMOD Rec.最新文献

筛选
英文 中文
Optimizing Tree Patterns for Querying Graph- and Tree-Structured Data 优化树模式查询图和树结构数据
SIGMOD Rec. Pub Date : 2017-05-12 DOI: 10.1145/3093754.3093759
Wojciech Czerwinski, W. Martens, Matthias Niewerth, P. Parys
{"title":"Optimizing Tree Patterns for Querying Graph- and Tree-Structured Data","authors":"Wojciech Czerwinski, W. Martens, Matthias Niewerth, P. Parys","doi":"10.1145/3093754.3093759","DOIUrl":"https://doi.org/10.1145/3093754.3093759","url":null,"abstract":"Many of today's graph query languages are based on graph pattern matching. We investigate optimization for treeshaped patterns with transitive closure. Such patterns are quite expressive, yet can be evaluated efficiently. The minimization problem aims at reducing the number of nodes in patterns and goes back to the early 2000's. We provide an example showing that, in contrast to earlier claims, tree patterns cannot be minimized by deleting nodes only. The example resolves the M ?/= NR problem, which asks if a tree pattern is minimal if and only if it is nonredundant. The example can be adapted to also understand the complexity of minimization, which was another question that was open since the early research on the problem. Interestingly, the latter result also shows that, unless standard complexity assumptions are false, more general approaches for minimizing tree patterns are also bound to fail in some cases.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"3 1","pages":"15-22"},"PeriodicalIF":0.0,"publicationDate":"2017-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75474532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Technical Perspective: Reflections on Extending SQL using Constraints 技术视角:关于使用约束扩展SQL的思考
SIGMOD Rec. Pub Date : 2017-05-12 DOI: 10.1145/3093754.3093760
S. Chaudhuri
{"title":"Technical Perspective: Reflections on Extending SQL using Constraints","authors":"S. Chaudhuri","doi":"10.1145/3093754.3093760","DOIUrl":"https://doi.org/10.1145/3093754.3093760","url":null,"abstract":"(a) Application developers needed a programmatic way to invoke relational query functionality from within their applications. The most primitive and most prevalent form of such integration uses ODBC or JDBC APIs. While they provide connectivity to database objects, the application programmer still must manage two separate type systems and programming models. LINQ (Language Integrated Query) is an elegant example of integration where query expressions are introduced as first class citizen in the programming languages. Object-relational mapping tools allow the application programmer to continue working in their object-oriented programming paradigm even though they may be storing and retrieving relational database objects.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"26 1","pages":"23"},"PeriodicalIF":0.0,"publicationDate":"2017-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83738888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Scalable Execution Engine for Package Queries 包查询的可伸缩执行引擎
SIGMOD Rec. Pub Date : 2017-05-12 DOI: 10.1145/3093754.3093761
Matteo Brucato, A. Abouzeid, A. Meliou
{"title":"A Scalable Execution Engine for Package Queries","authors":"Matteo Brucato, A. Abouzeid, A. Meliou","doi":"10.1145/3093754.3093761","DOIUrl":"https://doi.org/10.1145/3093754.3093761","url":null,"abstract":"Many modern applications and real-world problems involve the design of item collections, or packages: from planning your daily meals all the way to mapping the universe. Despite the pervasive need for packages, traditional data management does not offer support for their definition and computation. This is because traditional database queries follow a powerful, but very simple model: a query defines constraints that each tuple in the result must satisfy. However, a system tasked with the design of packages cannot consider items independently; rather, the system needs to determine if a set of items collectively satisfy given criteria.\u0000 In this paper, we present package queries, a new query model that extends traditional database queries to handle complex constraints and preferences over answer sets. We develop a full-fledged package query system, implemented on top of a traditional database engine. Our work makes several contributions. First, we design PaQL, a SQL-based query language that supports the declarative specification of package queries. Second, we present a fundamental strategy for evaluating package queries that combines the capabilities of databases and constraint optimization solvers. The core of our approach is a set of translation rules that transform a package query to an integer linear program. Third, we introduce an offline data partitioning strategy allowing query evaluation to scale to large data sizes. Fourth, we introduce SKETCHREFINE, an efficient and scalable algorithm for package evaluation, which offers strong approximation guarantees. Finally, we present extensive experiments over real-world data. Our results demonstrate that SKETCHREFINE is effective at deriving high-quality package results, and achieves runtime performance that is an order of magnitude faster than directly using ILP solvers over large datasets.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"20 1","pages":"24-31"},"PeriodicalIF":0.0,"publicationDate":"2017-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83169806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Technical Perspective: Scaling Machine Learning via Compressed Linear Algebra 技术视角:通过压缩线性代数扩展机器学习
SIGMOD Rec. Pub Date : 2017-05-12 DOI: 10.1145/3093754.3093764
Z. Ives
{"title":"Technical Perspective: Scaling Machine Learning via Compressed Linear Algebra","authors":"Z. Ives","doi":"10.1145/3093754.3093764","DOIUrl":"https://doi.org/10.1145/3093754.3093764","url":null,"abstract":"Demand for more powerful “big data analytics” solutions has spurred a great deal of interest in the core programming models, abstractions, and platforms for next-generation systems. For these problems, a complete solution would address data wrangling and processing, and support analytics over data of any modality or scale. It would support a wide array of machine learning algorithms, but also provide primitives for building new ones. It should be customizable, scale to vast volumes of data, and map to modern multicore, GPU, co-processor, and compute cluster hardware. In pursuit of these goals, novel techniques and solutions are being developed by machine learning researchers (e.g., high-performance libraries like Theano [6], runtime systems like GraphLab [5]), in the database and distributed systems research communities (e.g., distributed data analytics engines like Spark [7] and Flink [3]), and in industry by major technology players (e.g., Google’s TensorFlow [1] and IBM/Apache’s SystemML [4]). These libraries and platforms support multiple development languages, provide abstract datatypes for machine learning over data, and include compilers and runtime systems optimized for distributed execution on modern hardware. The database community excels in developing techniques for cost-estimating and optimizing declarative programs, and in exploiting data independence to optimize data placement and layout for performance. Elgohary et al’s work on “Scaling Machine Learning via Compressed Linear Algebra,”which appeared in the Proceedings of the VLDB Endowment [2], was conducted within IBM and Apache’s SystemML declarative machine learning project. It shows just how e↵ective such database techniques can be in a machine learning setting. The authors observe that the core data objects in machine learning – feature matrices, weight vectors – tend to have repeated values as well as regular structure, and may be quite large. Machine learning tasks over such data are composed from lower-level linear algebra operations. Such operations generally involve repeated floating-point computation that today are bandwidth-limited, by the ability of the CPU to traverse large matrices in RAM. The authors’ solution is to develop a compressed representation for matrices, as well as compressed linear algebra operations that work directly over the compressed matrix data. Together, these reduce the bandwidth required to perform the same computations, thus speeding performance dramatically. The paper cleverly adapts ideas first developed in relational database systems — column-oriented compression, sampling-based cost estimation, trading between compression speed and compression rate — to build an elegant implementation. The paper makes a number of key contributions. First, the authors identify a set of linear algebra primitives shared by multiple distributed machine learning platforms and algorithms. Second, they develop compression techniques not only for single columns in a","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"69 1","pages":"41"},"PeriodicalIF":0.0,"publicationDate":"2017-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83974143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Updates to the TODS Editorial Board TODS编辑委员会的最新情况
SIGMOD Rec. Pub Date : 2017-05-12 DOI: 10.1145/3093754.3093767
Christian S. Jensen
{"title":"Updates to the TODS Editorial Board","authors":"Christian S. Jensen","doi":"10.1145/3093754.3093767","DOIUrl":"https://doi.org/10.1145/3093754.3093767","url":null,"abstract":"It is of paramount importance for a scholarly journal such as ACM Transactions on Database Systems to have a strong editorial board of respected, world-class scholars. The editorial board plays a fundamental role in attracting the best submissions, in ensuring insightful and timely handling of submissions, in maintaining the high scientific standards of the journal, and in maintaining the reputation of the journal. Indeed, the journal’s associate editors, along with the reviewers and authors they work with, are the primary reason that TODS is a world-class journal. As of January 1, 2017, three Associate Editors—Divyakant Agrawal, Sihem Amer-Yahia, and Paolo Ciaccia—ended their terms, each having served on the editorial board for roughly six years. In addition, they will stay on until they complete their current loads. Paolo, Divy, and Sihem have provided very substantial, high-caliber service to the journal and the database community. Specifically, they have lent their extensive experience, deep insight, and sound technical judgment to the journal. I have never seen them compromise on quality when handling submissions. Surely, they have had many other demands on their time, many of which are better paid, during these past six years. We are all fortunate that they have donated their time and unique expertise to the journal and our community during half a dozen years. They deserve our recognition for their commitment to the scientific enterprise. Also as of January 1, 2017, three new Associate Editors joined the editorial board:","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"70 1","pages":"50"},"PeriodicalIF":0.0,"publicationDate":"2017-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73717430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Technical Perspective: Juggling Functions Inside a Database 技术角度:处理数据库中的函数
SIGMOD Rec. Pub Date : 2017-05-12 DOI: 10.1145/3093754.3093756
Dan Olteanu
{"title":"Technical Perspective: Juggling Functions Inside a Database","authors":"Dan Olteanu","doi":"10.1145/3093754.3093756","DOIUrl":"https://doi.org/10.1145/3093754.3093756","url":null,"abstract":"The paper entitled ”Juggling Functions Inside a Database” gives a brief overview of FAQ, a framework for computational problems expressed as Functional Aggregate Queries. This work falls into my bucket of select database research contributions that go significantly beyond the state of the art along several dimensions. First, it provides an elegant and declarative formalism for a host of ubiquituous computational problems across Computer Science and at the right level of abstraction that exposes structural properties of the problem instances and allows for fine-grained complexity analysis. Second, it is technically deep, proposing an algorithmic solution that achieves lower than or the same complexity as specialized approaches in their respective domain. Third, it is implemented in a commercial database system with scores of real-world applications. Fourth, it is currently applied to in-database analytics and I expect more applications will manifest themselves in the near future. By unifying many problems under the same formalism, FAQ bears the promise of accelerating research: Scalable data management solutions developed by our community for aggregates over joins, e.g., incremental view maintenance, index data structures, or distributed processing, may become generalpurpose solutions for problems outside databases. I will next expand on some of its contributions.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"50 1","pages":"5"},"PeriodicalIF":0.0,"publicationDate":"2017-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89290855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Wander Join and XDB: Online Aggregation via Random Walks Wander Join和XDB:通过随机漫步进行在线聚合
SIGMOD Rec. Pub Date : 2017-05-12 DOI: 10.1145/3093754.3093763
Feifei Li, Bin Wu, K. Yi, Zhuoyue Zhao
{"title":"Wander Join and XDB: Online Aggregation via Random Walks","authors":"Feifei Li, Bin Wu, K. Yi, Zhuoyue Zhao","doi":"10.1145/3093754.3093763","DOIUrl":"https://doi.org/10.1145/3093754.3093763","url":null,"abstract":"Joins are expensive, and online aggregation is an effective approach to explore the tradeoff between query efficiency and accuracy in a continuous, online fashion. However, the stateof- the-art approach, in both internal and external memory, is based on ripple join, which is still very expensive and needs strong assumptions (e.g., the tuples in a table are stored in random order). This paper proposes a new approach, the wander join algorithm, to the online aggregation problem by performing random walks over the underlying join graph. We also design an optimizer that chooses the optimal plan for conducting the random walks without having to collect any statistics a priori. Selection predicates and group-by clauses can be handled as well. We have developed an online engine called XDB by integrating wander join in the latest version of PostgreSQL. Extensive experiments using the TPC-H benchmark have shown the superior performance of wander join. The XDB implementation has demonstrated its practicality in a full-fledged database system.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"37 1","pages":"33-40"},"PeriodicalIF":0.0,"publicationDate":"2017-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80297974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Technical Perspective: Optimizing Tree Patterns for Querying Graph- and Tree-Structured Data 技术视角:优化查询图和树结构数据的树模式
SIGMOD Rec. Pub Date : 2017-05-12 DOI: 10.1145/3093754.3093758
B. Kimelfeld
{"title":"Technical Perspective: Optimizing Tree Patterns for Querying Graph- and Tree-Structured Data","authors":"B. Kimelfeld","doi":"10.1145/3093754.3093758","DOIUrl":"https://doi.org/10.1145/3093754.3093758","url":null,"abstract":"From the early days of databases, practitioners and researchers have pursued techniques for rewriting queries into equivalent ones that are easier to evaluate. The following paper closes a fundamental gap that we have had in our understanding of this challenge in the context of tree patterns. Such patterns are common and basic components of query languages for graph and tree data such as SPARQL, Cypher and XQuery. The authors study the question of whether the given tree pattern can be replaced with a smaller one, the question of whether it involves redundant conditions, and most importantly, the relationship between these two questions.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"14 1","pages":"14"},"PeriodicalIF":0.0,"publicationDate":"2017-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86351704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Report on the First International Workshop on Reproducible Open Science 第一届可再生开放科学国际研讨会报告
SIGMOD Rec. Pub Date : 2017-05-11 DOI: 10.1145/3092931.3092942
P. Manghi, Jochen Schirrwagen, Óscar Corcho, Amir Aryani
{"title":"Report on the First International Workshop on Reproducible Open Science","authors":"P. Manghi, Jochen Schirrwagen, Óscar Corcho, Amir Aryani","doi":"10.1145/3092931.3092942","DOIUrl":"https://doi.org/10.1145/3092931.3092942","url":null,"abstract":"In the last decade, information and communication technology (ICT) advances have deeply affected the scientific process, which increasingly produces and relies on digital research products, such as publications, datasets, experiments, websites, software, blogs, etc. Accordingly, scientific communication has started mutating in order to adapt its mission (and business models) to such new scientific paradigms and benefit from the unprecedented Open Science opportunities that may arise from them: reproducibility, i.e., the ability of repeating a digital experiment and reusing its constituent products; and transparent evaluation, i.e., the ability of (i) effectively evaluating scientific experiments by means of reproducibility and (ii) assigning fine-grained scientific reward, based on effective authorship across the overall scientific process. Scientists, research institutions, and funders are pushing for innovative Open Science scholarly communication workflows (i.e., submission, peer-review, access, reuse, citation, and scientific reward), marrying a holistic approach where publishing includes in principle any digital product resulting from a research activity that is relevant to the evaluation and reproducibility of the activity or part of it. Defining, taking up, and supporting Open Science publishing workflows become urgent challenges, to be addressed by ICT solutions capable of fostering and driving radical changes in the way science is developed and disseminated. The goal of the first International Workshop on Reproducible Open Science1 was to provide a forum for constructively exploring foundational, orga-","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"29 1","pages":"49-52"},"PeriodicalIF":0.0,"publicationDate":"2017-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89490003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Guide to Formal Analysis of Join Processing in Massively Parallel Systems 大规模并行系统中联接处理的形式化分析指南
SIGMOD Rec. Pub Date : 2017-05-11 DOI: 10.1145/3092931.3092934
Paraschos Koutris, Dan Suciu
{"title":"A Guide to Formal Analysis of Join Processing in Massively Parallel Systems","authors":"Paraschos Koutris, Dan Suciu","doi":"10.1145/3092931.3092934","DOIUrl":"https://doi.org/10.1145/3092931.3092934","url":null,"abstract":"Over the last decade, there has been an enormous increase in the volume of data that is being stored, processed and analyzed. In order to improve the performance of query processing on such amounts of data, many modern data management systems (e.g. Spark [23, 28], Hadoop [13, 9, 24], and others [19, 14]) have resorted to the power of parallelism to speed up computation. Parallelism enables the distribution of computation for data-intensive tasks into hundreds, or even thousands of machines, and thus significantly reduces the completion time for several crucial data processing tasks. In this paper, we present a survey on recent results [18, 4, 5, 17] that study the computational complexity of mulitway join processing in such massively parallel systems. Our goal is twofold. First, we introduce a simple theoretical model, called the MPC (Massively Parallel Computation) model, that allows us to rigorously analyze the computational complexity of various parallel algorithms for query processing. Second, using the MPC model as a theoretical tool, we show how we can design novel algorithms and techniques for multiway join processing, and how we can prove their optimality through tight lower bounds. Our analysis provides a deeper understanding of how much synchronization, communication and data load is required when we compute a multiway join query, and informs of what is possible to achieve under specific system constraints.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"24 1","pages":"18-27"},"PeriodicalIF":0.0,"publicationDate":"2017-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90755633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信