SIGMOD Rec.Pub Date : 2019-11-05DOI: 10.1145/3371316.3371322
B. Kimelfeld, W. Martens
{"title":"Technical Perspective: Entity Matching with Quality and Error Guarantees","authors":"B. Kimelfeld, W. Martens","doi":"10.1145/3371316.3371322","DOIUrl":"https://doi.org/10.1145/3371316.3371322","url":null,"abstract":"The challenge of entity matching is that of identifying when different data items (often referred to as records or mentions) refer to the same real-life entity. Popular instantiations of this problem include deduplication, where the items are database records that include duplicate representations of the same entity (e.g., duplicate profiles in a social network) [2], record linkage, where the items come from different data sources that mention overlapping sets of entities (e.g., the profiles of two social networks) [5], and schema matching, where the items are attributes of different database schemas that intersect on their domain of interest (e.g., the database schemas of different social networks) [6].","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"24 1","pages":"23"},"PeriodicalIF":0.0,"publicationDate":"2019-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84513815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2019-11-05DOI: 10.1145/3371316.3371330
Dan Suciu
{"title":"Technical Perspective for: MATLANG: Matrix Operations and Their Expressive Power","authors":"Dan Suciu","doi":"10.1145/3371316.3371330","DOIUrl":"https://doi.org/10.1145/3371316.3371330","url":null,"abstract":"The main processing paradigm in data management is bulk processing. As introduced by Codd in the early 70's, under this paradigm relations are processed in bulk, one operator at a time. When applied to relations, this paradigm leads to relational algebra, and its variants, relational calculus, and SQL. Over the years, data management was faced with the challenge of extending bulk processing operators to new kinds of data, and/or new kinds of queries: nested relations, semistructured data, recursive queries. Each such extension requires significant systems development, which should be accompanied, in fact preceded, by a careful study of the expressive power of the new language. Is it as expressive, more expressive, or less expressive than relational algebra? The answer to this question has profound implications on the ability of data processing engines to optimize, compute, distribute, reuse queries in that language. For example, extending relational algebra with nested relations does not increase its expressive power, while extending it with fixpoint does, explaining why modern query engines have an easier time supporting JSON than recursion.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"42 1","pages":"59"},"PeriodicalIF":0.0,"publicationDate":"2019-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89588165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2019-11-05DOI: 10.1145/3371316.3371323
Yufei Tao
{"title":"Entity Matching with Quality and Error Guarantees","authors":"Yufei Tao","doi":"10.1145/3371316.3371323","DOIUrl":"https://doi.org/10.1145/3371316.3371323","url":null,"abstract":"Given two sets of entities X and Y , entity matching aims to decide whether x and y represent the same entity for each pair (x, y) ∈ X × Y . In many scenarios, the only way to ensure perfect accuracy is to launch a costly inspection procedure on every (x, y), whereas performing the procedure |X| · |Y | times is prohibitively expensive. It is, therefore, important to design an algorithm that carries out the procedure on only some pairs, and renders the verdicts on the other pairs automatically with as few mistakes as possible. This article describes an algorithm that achieves the purpose using the methodology of active monotone classification. The algorithm ensures an asymptotically optimal tradeoff between the number of pairs inspected and the number of mistakes made.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"40 2 1","pages":"24-31"},"PeriodicalIF":0.0,"publicationDate":"2019-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90106657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2019-11-05DOI: 10.1145/3371316.3371332
K. Yi
{"title":"Technical Perspective: Online Model Management via Temporally Biased Sampling","authors":"K. Yi","doi":"10.1145/3371316.3371332","DOIUrl":"https://doi.org/10.1145/3371316.3371332","url":null,"abstract":"Randoms sampling from data streams is a problem with a long history of studies, starting from the famous reservoir sampling algorithm that is at least 50 years old [2]. The reservoir sampling algorithm maintains a random sample over all data items that have ever been received from the stream. This is not suitable for many of today's applications on evolving data streams, where recent data is more important than older ones.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"6 1","pages":"68"},"PeriodicalIF":0.0,"publicationDate":"2019-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84823269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2019-11-05DOI: 10.1145/3371316.3371333
Brian Hentschel, P. Haas, Yuanyuan Tian
{"title":"Online Model Management via Temporally Biased Sampling","authors":"Brian Hentschel, P. Haas, Yuanyuan Tian","doi":"10.1145/3371316.3371333","DOIUrl":"https://doi.org/10.1145/3371316.3371333","url":null,"abstract":"To maintain the accuracy of supervised learning models in the presence of evolving data streams, we provide temporallybiased sampling schemes that weight recent data most heavily, with inclusion probabilities for a given data item decaying exponentially over time. We then periodically retrain the models on the current sample. We provide and analyze both a simple sampling scheme (T-TBS) that probabilistically maintains a target sample size and a novel reservoirbased scheme (R-TBS) that is the first to provide both control over the decay rate and a guaranteed upper bound on the sample size. The R-TBS and T-TBS schemes are of independent interest, extending the known set of unequalprobability sampling schemes. We discuss distributed implementation strategies; experiments in Spark show that our approach can increase machine learning accuracy and robustness in the face of evolving data.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"43 1","pages":"69-76"},"PeriodicalIF":0.0,"publicationDate":"2019-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77403400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2019-11-05DOI: 10.1145/3371316.3371320
Graham Cormode
{"title":"Technical Perspective: #8712;KTELO","authors":"Graham Cormode","doi":"10.1145/3371316.3371320","DOIUrl":"https://doi.org/10.1145/3371316.3371320","url":null,"abstract":"When was the last time that you wrote code to implement a join algorithm? Chances are, it was during an undergraduate database class - if at all. The wide availability of database management systems in all their manifestations (admitting a wide definition, to encompass performing look-ups in a spreadsheet) mean that we do not have to (re)implement common operations over and over again. This brings many advantages. We benefit from time savings, both in development time, and also in execution time: we can expect that optimized professional code will outperform our ad-hoc efforts. Moreover, we expect such code to be robust, and less prone to crashing on unexpected inputs. It should produce results that can be relied on to be correct, and handle errors gracefully.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"48 1","pages":"14"},"PeriodicalIF":0.0,"publicationDate":"2019-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81694700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2019-03-01DOI: 10.1145/3371316.3371325
Muhammad Idris, M. Ugarte, Stijn Vansummeren, H. Voigt, Wolfgang Lehner
{"title":"Efficient Query Processing for Dynamically Changing Datasets","authors":"Muhammad Idris, M. Ugarte, Stijn Vansummeren, H. Voigt, Wolfgang Lehner","doi":"10.1145/3371316.3371325","DOIUrl":"https://doi.org/10.1145/3371316.3371325","url":null,"abstract":"The ability to efficiently analyze changing data is a key requirement of many real-time analytics applications. Traditional approaches to this problem were developed around the notion of Incremental View Maintenance (IVM), and are based either on the materialization of subresults (to avoid their recomputation) or on the recomputation of subresults (to avoid the space overhead of materialization). Both techniques are suboptimal: instead of materializing results and subresults, one may also maintain a data structure that supports efficient maintenance under updates and from which the full query result can quickly be enumerated. In two previous articles, we have presented algorithms for dynamically evaluating queries that are easy to implement, efficient, and can be naturally extended to evaluate queries from a wide range of application domains. In this paper, we discuss our algorithm and its complexity, explaining the main components behind its efficiency. Finally, we show experiments that compare our algorithm to a state-of-the-art (Higher-order) IVM engine, as well as to a prominent complex event recognition engine. Our approach outperforms the competitor systems by up to two orders of magnitude in processing time, and one order in memory consumption.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"73 1","pages":"33-40"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90622497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2019-02-27DOI: 10.1145/3316416.3316418
P. Buneman, W. Tan
{"title":"Data Provenance: What next?","authors":"P. Buneman, W. Tan","doi":"10.1145/3316416.3316418","DOIUrl":"https://doi.org/10.1145/3316416.3316418","url":null,"abstract":"Research into data provenance has been active for almost twenty years. What has it delivered and where will it go next? What practical impact has it had and what might it have? We provide speculative answers to these questions which may be somewhat biased by our initial motivation for studying the topic: the need for provenance information in curated databases. Such databases involve extensive human interaction with data; and we argue that the need continues in other forms of human interaction such as those that take place in social media.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"4 1","pages":"5-16"},"PeriodicalIF":0.0,"publicationDate":"2019-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90754115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2019-02-27DOI: 10.1145/3316416.3316420
Richard Shay, U. Blumenthal, V. Gadepally, Ariel Hamlin, John Darby Mitchell, R. Cunningham
{"title":"Don't Even Ask: Database Access Control through Query Control","authors":"Richard Shay, U. Blumenthal, V. Gadepally, Ariel Hamlin, John Darby Mitchell, R. Cunningham","doi":"10.1145/3316416.3316420","DOIUrl":"https://doi.org/10.1145/3316416.3316420","url":null,"abstract":"This paper presents a vision and description for query control, which is a paradigm for database access control. In this model, individual queries are examined before being executed and are either allowed or denied by a pre-defined policy. Traditional view-based database access control requires the enforcer to view the query, the records, or both. That may present difficulty when the enforcer is not allowed to view database contents or the query itself. This discussion of query control arises from our experience with privacy-preserving encrypted databases, in which no single entity learns both the query and the database contents. Query control is also a good fit for enforcing rules and regulations that are not well-addressed by view-based access control. With the rise of federated database management systems, we believe that new approaches to access control will be increasingly important.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"54 1","pages":"17-22"},"PeriodicalIF":0.0,"publicationDate":"2019-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76612267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIGMOD Rec.Pub Date : 2019-02-27DOI: 10.1145/3316416.3316422
M. Winslett, V. Braganholo
{"title":"Timos Sellis Speaks Out on Research in Australia and Greece","authors":"M. Winslett, V. Braganholo","doi":"10.1145/3316416.3316422","DOIUrl":"https://doi.org/10.1145/3316416.3316422","url":null,"abstract":"Welcome to ACM SIGMOD Record's series of interviews with distinguished members of the database community. I'm Marianne Winslett, and today we are in Snowbird, Utah, USA, site of the 2014 SIGMOD and PODS conference. I have here with me Timos Sellis, who is a professor at the Royal Melbourne Institute of Technology1. Before that, he was at the National Technical University of Athens and the University of Maryland. He is an ACM Fellow and an IEEE Fellow, and he has a VLDB 10-Year Paper Award. His PhD is from Berkeley.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"22 6S 1","pages":"23-28"},"PeriodicalIF":0.0,"publicationDate":"2019-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76518125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}