SIGMOD Rec.Pub Date : 2018-09-10DOI: 10.1145/3277006.3277016
Z. Ives
{"title":"Technical Perspective:: Natural Language Explanations for Query Results","authors":"Z. Ives","doi":"10.1145/3277006.3277016","DOIUrl":"https://doi.org/10.1145/3277006.3277016","url":null,"abstract":"Motivated by conversational agents such as Siri, Cortana, the Google Assistant, and Alexa — there has been a surge of interest in spoken as well as textual natural language interfaces. To this point, such systems have relied on innovations in speech recognition (such as recurrent neural networks, LSTMs, and so on) and in specially encoding specific questionanswering strategies via “skills.” A “natural” question for the SIGMOD community is how to best connect natural language interfaces systems to DBMSs, ideally in a way that generalizes to any database schema or instance. In fact, the problem of providing a natural language interface to a database system (i.e., mapping from a spoken or textual question to a structured query) dates back at least to the 1980s [4]. Such efforts had middling success due to issues of accuracy, so the problems were later revisited in the 2000’s with an eye towards restricting the space of options in order to improve precision [6]. Nonetheless, such systems did not gain much traction, again due to the challenges of ensuring accuracy for a given database when the user might ask an ambiguous question. Recent work by Li and Jagadish [5], called NaLIR, proposed an interactive communicator within the query system, which presents to the user a query tree explaining what the system was going to do — such that the user could correct any mistakes. This was helpful in improving reliability, but it required that the user understand tree structured representations of queries. In “Natural Language Explanations for Query Results,” Deutch and his co-authors suggest that a more effective means of helping the user understand and correct results might be through provenance information — i.e., giving an explanation for each answer of how and why it exists. Their approach adapts the NaLIR system and nicely leverages the recent body of work on provenance semirings [3, 2, 1]. The provenance semiring model has an important property that equivalent query plans (as produced by a query optimizer) will have equivalent provenance expressions. The innovations in this paper are in three areas. First, the authors use the structure of the natural language query itself (and the mappings to structured queries, and then later, from queries to provenance) to present the provenance in a form that matches the natural language query — and thus the user’s expectations. Second, they reduce the size (and repetition) of the provenance via factoring. Finally, they incorporate aggregate results (e.g., counts) in place of certain details. The paper does a great job of clearly identifying and articulating what makes the provenance problem different for natural language query systems, and presenting elegant technical solutions to these new challenges.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"31 1","pages":"41"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82459393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2018-09-10DOI: 10.1145/3277006.3277010
Z. Ives
{"title":"Technical Perspective:: From Think Parallel to Think Sequential","authors":"Z. Ives","doi":"10.1145/3277006.3277010","DOIUrl":"https://doi.org/10.1145/3277006.3277010","url":null,"abstract":"In recent years, the database and distributed systems communities have built a wide variety of runtime systems and programming models for largescale computing over graphs. Such “big graph processing systems” [1, 2, 4, 5, 7] o support highly scalable parallel execution of graph algorithms — e.g., computing shortest paths, graph centrality, connected components, or perhaps even graph clusters. As described in the excellent survey by Yan et al [6], most big graph processing systems require the programmer to adopt a vertex-centric or block-centric programming model. For the former, code only “sees” the state at one vertex, receives messages from other vertices, and can send messages to other vertices. Under the latter, code manages a set of vertices within a subgraph (“block”) and can communicate with the code managing other blocks. In “From think Parallel to Think Sequential,” Fan and colleagues argue that vertexand blockcentric programming models are not natural for programmers trained to think sequentially. Instead, they argue that a more intuitive programming model can be developed out of several very simple primitives that can be composed to do incremental computation (as has also been studied in more general “big data” systems [4, 3]). The authors propose four elegant building blocks: (1) a partial evaluation function, (2) an incremental update handling function, (3) mechanisms for updating and sharing parameters in global fashion, and (4) an aggregate function for when multiple workers are updating the same parameter. They build the GRAPE GRAPh Engine system, which implements this programming model, and they show that it provides excellent performance for a variety of graph algorithms. The paper presents a compelling case that, at least for certain classes of algorithms, the simple primitives may be both more natural and more amenable to optimization than standard vertex-centric approaches.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"1 1","pages":"14"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91235320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2018-09-10DOI: 10.1145/3277006.3277011
W. Fan, Yang Cao, Jingbo Xu, Wenyuan Yu, Yinghui Wu, Chao Tian, Jiaxin Jiang, Bohan Zhang
{"title":"From Think Parallel to Think Sequential","authors":"W. Fan, Yang Cao, Jingbo Xu, Wenyuan Yu, Yinghui Wu, Chao Tian, Jiaxin Jiang, Bohan Zhang","doi":"10.1145/3277006.3277011","DOIUrl":"https://doi.org/10.1145/3277006.3277011","url":null,"abstract":"This paper presents GRAPE , a parallel GRAPh Engine for graph computations. GRAPE differs from previous graph systems in its ability to parallelize existing sequential graph algorithms as a whole, without the need for recasting the entire algorithms into a new model. Underlying GRAPE are a simple programming model, and a principled approach based on fixpoint computation with partial evaluation and incremental computation. Under a monotonic condition, GRAPE guarantees to converge at correct answers as long as the sequential algorithms are correct. We show how our familiar sequential graph algorithms can be parallelized by GRAPE . In addition to the ease of programming, we experimentally verify that GRAPE achieves comparable performance to the state-of-the-art graph systems, using real-life and synthetic graphs.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"46 1","pages":"15-22"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90731770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2018-02-22DOI: 10.1145/3186549.3186559
S. Sadiq, T. Dasu, X. Dong, J. Freire, I. Ilyas, S. Link, Renée J. Miller, Felix Naumann, Xiaofang Zhou, D. Srivastava
{"title":"Data Quality: The Role of Empiricism","authors":"S. Sadiq, T. Dasu, X. Dong, J. Freire, I. Ilyas, S. Link, Renée J. Miller, Felix Naumann, Xiaofang Zhou, D. Srivastava","doi":"10.1145/3186549.3186559","DOIUrl":"https://doi.org/10.1145/3186549.3186559","url":null,"abstract":"We outline a call to action for promoting empiricism in data quality research. The action points result from an analysis of the landscape of data quality research. The landscape exhibits two dimensions of empiricism in data quality research relating to type of metrics and scope of method. Our study indicates the presence of a data continuum ranging from real to synthetic data, which has implications for how data quality methods are evaluated. The dimensions of empiricism and their inter-relationships provide a means of positioning data quality research, and help expose limitations, gaps and opportunities.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"19 1","pages":"35-43"},"PeriodicalIF":0.0,"publicationDate":"2018-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77842336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2018-02-22DOI: 10.1145/3186549.3186562
Niket Tandon, A. Varde, Gerard de Melo
{"title":"Commonsense Knowledge in Machine Intelligence","authors":"Niket Tandon, A. Varde, Gerard de Melo","doi":"10.1145/3186549.3186562","DOIUrl":"https://doi.org/10.1145/3186549.3186562","url":null,"abstract":"There is growing conviction that the future of computing depends on our ability to exploit big data on theWeb to enhance intelligent systems. This includes encyclopedic knowledge for factual details, common sense for human-like reasoning and natural language generation for smarter communication. With recent chatbots conceivably at the verge of passing the Turing Test, there are calls for more common sense oriented alternatives, e.g., the Winograd Schema Challenge. The Aristo QA system demonstrates the lack of common sense in current systems in answering fourth-grade science exam questions. On the language generation front, despite the progress in deep learning, current models are easily confused by subtle distinctions that may require linguistic common sense, e.g.quick food vs. fast food. These issues bear on tasks such as machine translation and should be addressed using common sense acquired from text. Mining common sense from massive amounts of data and applying it in intelligent systems, in several respects, appears to be the next frontier in computing. Our brief overview of the state of Commonsense Knowledge (CSK) in Machine Intelligence provides insights into CSK acquisition, CSK in natural language, applications of CSK and discussion of open issues. This paper provides a report of a tutorial at a recent conference with a brief survey of topics.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"3 1","pages":"49-52"},"PeriodicalIF":0.0,"publicationDate":"2018-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78735500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2018-02-22DOI: 10.1145/3186549.3186555
Vasilis Spyropoulos, Y. Kotidis
{"title":"Digree: Building A Distributed Graph Processing Engine out of Single-node Graph Database Installations","authors":"Vasilis Spyropoulos, Y. Kotidis","doi":"10.1145/3186549.3186555","DOIUrl":"https://doi.org/10.1145/3186549.3186555","url":null,"abstract":"In this work we present Digree, a system prototype that enables distributed execution of graph pattern matching queries in a cloud of interconnected graph databases. We explain how a graph query can be decomposed into independent sub-patterns that are processed in parallel by the distributed independent graph database systems and how the results are finally synthesized at a master node. We experimentally compare a prototype of our system against a popular big data engine and show that Digree provides significantly faster query execution.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"16 1","pages":"22-27"},"PeriodicalIF":0.0,"publicationDate":"2018-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81461829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2018-02-22DOI: 10.1145/3186549.3186557
M. Winslett, V. Braganholo
{"title":"Dan Suciu Speaks Out on Research, Shyness and Being a Scientist","authors":"M. Winslett, V. Braganholo","doi":"10.1145/3186549.3186557","DOIUrl":"https://doi.org/10.1145/3186549.3186557","url":null,"abstract":"Welcome to ACM SIGMOD Record's series of interviews with distinguished members of the database community. I'm Marianne Winslett, and today we are in Snowbird, Utah, USA, site of the 2014 SIGMOD and PODS conference. I have here with me Dan Suciu, who is a professor at the University of Washington. Dan has two Test of Time Awards from PODS as well as Best Paper Awards from SIGMOD and ICDT. Dan's Ph.D. is from the University of Pennsylvania","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"27 1","pages":"28-34"},"PeriodicalIF":0.0,"publicationDate":"2018-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83466539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2017-12-01DOI: 10.1145/3186549.3186551
P. Senellart
{"title":"Provenance and Probabilities in Relational Databases","authors":"P. Senellart","doi":"10.1145/3186549.3186551","DOIUrl":"https://doi.org/10.1145/3186549.3186551","url":null,"abstract":"We review the basics of data provenance in relational databases. We describe different provenance formalisms, from Boolean provenance to provenance semirings and beyond, that can be used for a wide variety of purposes, to obtain additional information on the output of a query. We discuss representation systems for data provenance, circuits in particular, with a focus on practical implementation. Finally, we explain how provenance is practically used for probabilistic query evaluation in probabilistic databases.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"39 1","pages":"5-15"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76160243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2017-10-31DOI: 10.1145/3156655.3156661
Sérgio Esteves, N. Janssens, B. Theeten, L. Veiga
{"title":"Empowering Stream Processing through Edge Clouds","authors":"Sérgio Esteves, N. Janssens, B. Theeten, L. Veiga","doi":"10.1145/3156655.3156661","DOIUrl":"https://doi.org/10.1145/3156655.3156661","url":null,"abstract":"CHive is a new streaming analytics platform to run distributed SQL-style queries on edge clouds. However, CHive is currently tightly coupled to a specific stream processing system (SPS), Apache Storm. In this paper we address the decoupling of the CHive query planner and optimizer from the runtime environment, and also extend the latter to support pluggable runtimes through a common API. As runtimes, we currently support Apache Spark and Flink streaming. The fundamental contribution of this paper is to assess the cost of employing interstream parallelism in SPS. Experimental evaluation indicates that we can enable popular SPS to be distributed on edge clouds with stable overhead in terms of throughput","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"41 1","pages":"23-28"},"PeriodicalIF":0.0,"publicationDate":"2017-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80914859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2017-10-31DOI: 10.1145/3156655.3156657
P. Guagliardo, L. Libkin
{"title":"Correctness of SQL Queries on Databases with Nulls","authors":"P. Guagliardo, L. Libkin","doi":"10.1145/3156655.3156657","DOIUrl":"https://doi.org/10.1145/3156655.3156657","url":null,"abstract":"Multiple issues with SQL's handling of nulls have been well documented. Having efficiency as its main goal, SQL disregards the standard notion of correctness on incomplete databases -- certain answers -- due to its high complexity. As a result, the evaluation of SQL queries on databases with nulls may produce answers that are just plain wrong. However, SQL evaluation can be modified, at least for relational algebra queries, to approximate certain answers, i.e., return only correct answers. We examine recently proposed approximation schemes for certain answers and analyze their complexity, both theoretical bounds and real-life behavior","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"236 1","pages":"5-16"},"PeriodicalIF":0.0,"publicationDate":"2017-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89693791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}