SIGMOD Rec.Pub Date : 2019-02-27DOI: 10.1145/3316416.3316423
M. Winslett, V. Braganholo
{"title":"Peter Bailis Speaks Out on building tools users want to use","authors":"M. Winslett, V. Braganholo","doi":"10.1145/3316416.3316423","DOIUrl":"https://doi.org/10.1145/3316416.3316423","url":null,"abstract":"Welcome to this installment of ACM Sigmod Records series of interviews with distinguished members of the database community. I'm Marianne Winslett and today we're at the 2017 SIGMOD and PODS Conference in Chicago. I have here with me Peter Bailis who's a professor at Stanford University. Peter won the 2017 ACM SIGMOD Jim Gray Dissertation award for his thesis entitled \"Coordination Avoidance in Distributed Databases.\" Peter's advisors were Joseph Hellerstein, Ion Stoica, and Ali Ghodsi at Berkeley.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"36 1","pages":"29-31"},"PeriodicalIF":0.0,"publicationDate":"2019-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86340416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2019-02-27DOI: 10.1145/3316416.3316425
L. Singh, A. Deshpande, Wenchao Zhou, A. Banerjee, Alex J. Bowers, Sorelle A. Friedler, H. Jagadish, G. Karypis, Z. Obradovic, A. Vullikanti, W. Zuo
{"title":"NSF BIGDATA PI Meeting - Domain-Specific Research Directions and Data Sets","authors":"L. Singh, A. Deshpande, Wenchao Zhou, A. Banerjee, Alex J. Bowers, Sorelle A. Friedler, H. Jagadish, G. Karypis, Z. Obradovic, A. Vullikanti, W. Zuo","doi":"10.1145/3316416.3316425","DOIUrl":"https://doi.org/10.1145/3316416.3316425","url":null,"abstract":"In March 2017, PIs and co-PIs funded through the NSF BIGDATA program were brought together along with selected industry and government invitees to discuss current research, identify current challenges, discuss promising future directions, foster new collaborations, and share accomplishments, at BDPI-2017. Given that two recent NITRD [2] and NSF [1] meeting reports contained a set of recommendations, grand challenges, and high impact priorities for Big Data, the organizers of this meeting shifted the focus of the breakout sessions to discuss problems and available data sets that exist in five application domains - policy, health, education, economy & finance, and environment & energy. These domains were selected based on a survey of the PIs/co-PIs and should not be interpreted as being more important than others. Slides that were presented by the different breakout group leaders are available at https://www.bi.vt.edu/ nsf-big-data/. We hope this report will serve as a blueprint for promising big data research in five application domains.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"80 1","pages":"32-35"},"PeriodicalIF":0.0,"publicationDate":"2019-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80451006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2019-02-27DOI: 10.1145/3316416.3316426
S. Sakr, T. Rabl, Martin Hirzel, Paris Carbone, M. Strohbach
{"title":"Dagstuhl Seminar on Big Stream Processing","authors":"S. Sakr, T. Rabl, Martin Hirzel, Paris Carbone, M. Strohbach","doi":"10.1145/3316416.3316426","DOIUrl":"https://doi.org/10.1145/3316416.3316426","url":null,"abstract":"Stream processing can generate insights from big data in real time as it is being produced. This paper reports findings from a 2017 seminar on big stream processing, focusing on applications, systems, and languages.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"126 1","pages":"36-39"},"PeriodicalIF":0.0,"publicationDate":"2019-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91523500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2019-02-20DOI: 10.1145/3377330.3377334
Francesco Pierri, S. Ceri
{"title":"False News On Social Media: A Data-Driven Survey","authors":"Francesco Pierri, S. Ceri","doi":"10.1145/3377330.3377334","DOIUrl":"https://doi.org/10.1145/3377330.3377334","url":null,"abstract":"In the past few years, the research community has dedicated growing interest to the issue of false news circulating on social networks. The widespread attention on detecting and characterizing deceptive information has been motivated by considerable political and social backlashes in the real world. As a matter of fact, social media platforms exhibit peculiar characteristics, with respect to traditional news outlets, which have been particularly favorable to the proliferation of false news. They also present unique challenges for all kind of potential interventions on the subject.\u0000 As this issue becomes of global concern, it is also gaining more attention in academia. The aim of this survey is to offer a comprehensive study on the recent advances in terms of detection, characterization and mitigation of false news that propagate on social media, as well as the challenges and the open questions that await future research on the field. We use a data-driven approach, focusing on a classification of the features that are used in each study to characterize false information and on the datasets used for instructing classification methods. At the end of the survey, we highlight emerging approaches that look most promising for addressing false news.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"90 1","pages":"18-27"},"PeriodicalIF":0.0,"publicationDate":"2019-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75173052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2019-01-01DOI: 10.1145/3377391.3377396
José María Cavero Barca, Belén Vela, Paloma Cáceres
{"title":"Evaluation of an Implementation of Cross-Row Constraints Using Materialized Views","authors":"José María Cavero Barca, Belén Vela, Paloma Cáceres","doi":"10.1145/3377391.3377396","DOIUrl":"https://doi.org/10.1145/3377391.3377396","url":null,"abstract":"","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"24 1","pages":"23-28"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86311372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2018-09-10DOI: 10.1145/3277006.3277014
W. Tan
{"title":"Technical Perspective:: Toward Building Entity Matching Management Systems","authors":"W. Tan","doi":"10.1145/3277006.3277014","DOIUrl":"https://doi.org/10.1145/3277006.3277014","url":null,"abstract":"Entity matching, also known as entity resolution or reference reconciliation, is to identify when two (different) representations refer to the same real-world entity. Overcoming the entity matching problem is often a key step in today’s data preparation and integration pipeline before useful data can be produced for analysis. For example, to understand how many potential new customers there may be, a company may wish to integrate an internal repository of customer profiles to an externally sourced dataset that contains profiles of users (e.g., Twitter data). A successful entity matching process would need to discern when two heterogeneous customer profiles may actually refer to the same customer and also for the opposite, when two seemingly identical customer profiles may actually not be the same customer. For example, it is not obvious whether or not the these two records:","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"25 1","pages":"32"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82497674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2018-09-10DOI: 10.1145/3277006.3277012
Y. Papakonstantinou
{"title":"Technical Perspective:: Supporting Linear Algebra Operations in SQL","authors":"Y. Papakonstantinou","doi":"10.1145/3277006.3277012","DOIUrl":"https://doi.org/10.1145/3277006.3277012","url":null,"abstract":"Linear algebra operations are at the core of Machine Learning. Multiple specialized systems have emerged for the scalable, distributed execution of matrix and vector operations. The relationship of such computations to data management and databases however brings frictions. It is well known that a great deal of human time and machine time is being spent nowadays on fetching data out of the database and performing a computation on a specialized system. One answer to the issue is that we truly need a new kind of non-SQL database that is tuned to these computations. The creators of SimSQL opted for the decidedly incremental approach. Can we make a very small set of changes to the relational model and RDBMS software to render them suitable for executing linear algebra in the database? We have come across the \"brand new system\" versus \"incremental to relational\" question many times in the database field. E.g., do we need brand new query languages and query processors for data cubes? Or do we need to have our query processors pay attention to specific cases that are especially common in data analytics queries over stars and snowflakes? Do semistructured query languages need to depart from SQL or it is enough to be incremental to SQL? Same for query processors. Repeat the questions to graph data and RDF data. In many cases, new custom systems emerged only to figure out later that we could/should have tackled the problem incrementally. That’s the trap that the authors of this paper avoid. This is not to say that radical changes and extensions should be forbidden. Rather it says that we should closely scrutinize the necessity of the changes, do them when needed and keep them minimal. The authors identify the right opportunities. Here is a non-exhaustive list:","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"7 1","pages":"23"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78645095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2018-09-10DOI: 10.1145/3277006.3277008
W. Tan
{"title":"Technical Perspective:: A Relational Framework for Classifier Engineering","authors":"W. Tan","doi":"10.1145/3277006.3277008","DOIUrl":"https://doi.org/10.1145/3277006.3277008","url":null,"abstract":"A fundamental step in developing machine-learning solutions is that of feature engineering. Feature engineering refers to the process of generating a representation from data (called features) that can be fed as inputs to machinelearning models. The results of feature engineering thus have direct impact on the performance of machine-learning models. In developing machine-learning solutions, a large amount of time is typically devoted to feature engineering, which determines the right features to capture for improving the performance of the models. In this paper, the authors describe a framework for feature engineering for programming machine-learning solutions over a database, assuming the model inputs numerical paramaters that may be tuned by fitting to training examples. The focus of the paper is on a widely used class of machine-learning models, called classifiers, which are used to predict an unknown category of a given entity based on the properties of that entity. The running example of the paper considers the problem where a credit card company wishes to identify whether an incoming credit card transaction is a legitimate or fraudulent transaction (e.g., made with a stolen credit card). The credit card company may leverage historical transactions with both legitimate and fraudulent transactions as training data to train a classifier. The features that are extracted from the data may include properties that concern the state and country where a transaction was made compared to the state and country of billing address of the owner, the amount billed in the transaction, the history of transactions and so on. Their framework assumes an underlying entity schema, which is a relation schema with a distinguished relation symbol. The distinguished relation represents the set real-world objects where the classifier makes predications upon. For the credit card example, since the classifier will ultimately be applied on transactions to determine the legitimacy of transactions, a natural candidate for the distinguished entity relation in the credit card example is the transaction relation which stores all transactions that occurred. The remaining relation schema will include additional information about the transaction, such as the country and state where each transaction took place, the card and amount involved, and information about the billing address of the credit card. Feature engineering is modeled as the process where an analyst specifies a sequence of feature queries in some language. For example, a feature query may select all transactions that took place in the same country and state of the owner’s billing address, and another feature query may select all the ones that took place in the same country (but not necessarily the same state) of the owner. In their framework, a classifier is a function that maps a vector of numbers, where the numbers encode the features, into a boolean answer. To train a classifer, it is therefore necessary to conve","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"27 1","pages":"5"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85077980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2018-09-10DOI: 10.1145/3277006.3277009
B. Kimelfeld, C. Ré
{"title":"A Relational Framework for Classifier Engineering","authors":"B. Kimelfeld, C. Ré","doi":"10.1145/3277006.3277009","DOIUrl":"https://doi.org/10.1145/3277006.3277009","url":null,"abstract":"In the design of analytical procedures and machine-learning solutions, a critical and time-consuming task is that of feature engineering, for which various recipes and tooling approaches have been developed. We embark on the establishment of database foundations for feature engineering. Specifically, we propose a formal framework for classification in the context of a relational database. The goal of this framework is to open the way to research and techniques to assist developers with the task of feature engineering by utilizing the database's modeling and understanding of data and queries, and by deploying the well studied principles of database management. We demonstrate the usefulness of the framework by formally defining key algorithmic challenges and presenting preliminary complexity results.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"15 1","pages":"6-13"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75463911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2018-09-10DOI: 10.1145/3277006.3277017
Daniel Deutch, Nave Frost, Amir Gilad
{"title":"Natural Language Explanations for Query Results","authors":"Daniel Deutch, Nave Frost, Amir Gilad","doi":"10.1145/3277006.3277017","DOIUrl":"https://doi.org/10.1145/3277006.3277017","url":null,"abstract":"Multiple lines of research have developed Natural Language (NL) interfaces for formulating database queries. We build upon this work, but focus on presenting a highly detailed form of the answers in NL. The answers that we present are importantly based on the provenance of tuples in the query result, detailing not only the results but also their explanations. We develop a novel method for transforming provenance information to NL, by leveraging the original NL query structure. Furthermore, since provenance information is typically large and complex, we present two solutions for its effective presentation as NL text: one that is based on provenance factorization, with novel desiderata relevant to the NL case, and one that is based on summarization.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"45 1","pages":"42-49"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81548818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}