{"title":"Deciding What Not to Do","authors":"David Maier","doi":"10.1145/3552490.3552501","DOIUrl":"https://doi.org/10.1145/3552490.3552501","url":null,"abstract":"I recall an early conversation with my advisor, a couple years after I completed my PhD. I was worried about not having been invited onto any program committees when others in my cohort were getting such opportunities. He assured me that it would come in time, though I was still anxious. He was right; after another year or so, the invitations started coming. At this point, I need to decline most of them, or I'd spend all my time reviewing.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114657006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fotis Psallidas, Yiwen Zhu, Bojan Karlas, Jordan Henkel, Matteo Interlandi, Subru Krishnan, Brian Kroth, Venkatesh Emani, Wentao Wu, Ce Zhang, Markus Weimer, A. Floratou, C. Curino, Konstantinos Karanasos
{"title":"Data Science Through the Looking Glass","authors":"Fotis Psallidas, Yiwen Zhu, Bojan Karlas, Jordan Henkel, Matteo Interlandi, Subru Krishnan, Brian Kroth, Venkatesh Emani, Wentao Wu, Ce Zhang, Markus Weimer, A. Floratou, C. Curino, Konstantinos Karanasos","doi":"10.1145/3552490.3552496","DOIUrl":"https://doi.org/10.1145/3552490.3552496","url":null,"abstract":"The recent success of machine learning (ML) has led to an explosive growth of systems and applications built by an ever-growing community of system builders and data science (DS) practitioners. This quickly shifting panorama, however, is challenging for system builders and practitioners alike to follow. In this paper, we set out to capture this panorama through a wide-angle lens, performing the largest analysis of DS projects to date, focusing on questions that can advance our understanding of the field and determine investments. Specifically, we download and analyze (a) over 8M notebooks publicly available on GITHUB and (b) over 2M enterprise ML pipelines developed within Microsoft. Our analysis includes coarse-grained statistical characterizations, finegrained analysis of libraries and pipelines, and comparative studies across datasets and time. We report a large number of measurements for our readers to interpret and draw actionable conclusions on (a) what system builders should focus on to better serve practitioners and (b) what technologies should practitioners rely on.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128749630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reminiscences on Influential Papers","authors":"M. Tamer Özsu","doi":"10.1145/3552490.3552499","DOIUrl":"https://doi.org/10.1145/3552490.3552499","url":null,"abstract":"This column was established by Richard Snodgrass in 1998 and was continued by Ken Ross from 1999 to 2005. It celebrated one of the key aspects that makes us grow as a research community: the papers that influence us. At each issue, different members of the data management community wrote anecdotes about a paper that had a unique impact in their career. The anecdotes highlighted that impact can come in many forms. A paper's value is not only in its citation count, but also in the way it influences individuals who in turn influence other individuals that make up our community. Such impact is not countable. When the SIGMOD Record's editor-in-chief Rada Chirkova approached me to revive this column last year, I was immediately excited. I would like to thank Rada Chirkova, Richard Snodgrass, and Ken Ross for this opportunity. I am delighted to present the three invited contributions for this issue. Hope you enjoy reading them as much as I did. While I will keep inviting members of the data management community, and neighboring communities, to contribute to this column, I also welcome unsolicited contributions. Please contact me if you are interested.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129915780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Passive, Migration-Free, Standardized, Long-Term Database Archival","authors":"Raja Appuswamy","doi":"10.1145/3552490.3552506","DOIUrl":"https://doi.org/10.1145/3552490.3552506","url":null,"abstract":"\"How would you archive databases for the next 60 years such that they incur no migration cost, and they remain usable in 2080?\" This was an open challenge raised by digital preservation experts from the Landesarchiv of Baden-W¨urttemberg [12], who, similar to other memory institutions (archives, museums, libraries, etc.), have faced several challenges in archiving culturally significant, historic data stored in digital databases since early 1960s.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124944360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Technical Perspective","authors":"David P. Woodruff","doi":"10.1145/3542700.3542720","DOIUrl":"https://doi.org/10.1145/3542700.3542720","url":null,"abstract":"Model counting is the problem of approximately counting the number |Sol(Φ)| of satisfying assignments to a given model Φ, which could, for example, be a formula in conjunctive normal form (CNF) or a formula in disjunctive normal form (DNF).","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131927292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Technical Perspective","authors":"M. Yannakakis","doi":"10.1145/3542700.3542718","DOIUrl":"https://doi.org/10.1145/3542700.3542718","url":null,"abstract":"The paper Structure and Complexity of Bag Consistency by Albert Atserias and Phokion Kolaitis [1] studies fundamental structural and algorithmic questions on the global consistency of databases in the context of bag semantics. A collection D of relations is called globally consistent if there is a (so-called \"universal\") relation over all the attributes that appear in all the relations of D whose projections yield the relations in D. The basic algorithmic problem for consistency is: given a database D, determine whether D is globally consistent. An obvious necessary condition for global consistency is local (or pairwise) consistency: every pair of relations in D must be consistent. This condition is not sufficient however: It is possible that every pair is consistent, but there is no single global relation over all the attributes whose projections yield the relations in D. A natural structural question is: Which database schemas have the property that every locally consistent database over the schema is also globally consistent?","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128906731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Making Learned Query Optimization Practical","authors":"V. Markl","doi":"10.1145/3542700.3542702","DOIUrl":"https://doi.org/10.1145/3542700.3542702","url":null,"abstract":"Query optimization has been a challenging problem ever since the relational data model had been proposed. The role of the query optimizer in a database system is to compute an execution plan for a (relational) query expression comprised of physical operators whose implementations correspond to the operations of the (relational) algebra. There are many degrees of freedom for selecting a physical plan, in particular due to the laws of associativity, commutativity, and distributivity among the operators in the (relational) algebra, which necessitates our taking the order of operations into consideration. In addition, there are many alternative access paths to a dataset and a multitude of physical implementations for operations, such as relational joins (e.g., merge-join, nestedloop join, hash-join). Thus, when seeking to determine the best (or even a sufficiently good) execution plan there is a huge search space.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127901195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, Tim Kraska
{"title":"Bao","authors":"Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, Tim Kraska","doi":"10.1145/3542700.3542703","DOIUrl":"https://doi.org/10.1145/3542700.3542703","url":null,"abstract":"Recent efforts applying machine learning techniques to query optimization have shown few practical gains due to substantive training overhead, inability to adapt to changes, and poor tail performance. Motivated by these difficulties, we introduce Bao (the Bandit optimizer). Bao takes advantage of the wisdom built into existing query optimizers by providing per-query optimization hints. Bao combines modern tree convolutional neural networks with Thompson sampling, a well-studied reinforcement learning algorithm. As a result, Bao automatically learns from its mistakes and adapts to changes in query workloads, data, and schema. Experimentally, we demonstrate that Bao can quickly learn strategies that improve end-to-end query execution performance, including tail latency, for several workloads containing longrunning queries. In cloud environments, we show that Bao can offer both reduced costs and better performance compared with a commercial system.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116935702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graham Cormode, Zohar S. Karnin, Edo Liberty, J. Thaler, P. Veselý
{"title":"Relative Error Streaming Quantiles","authors":"Graham Cormode, Zohar S. Karnin, Edo Liberty, J. Thaler, P. Veselý","doi":"10.1145/3542700.3542717","DOIUrl":"https://doi.org/10.1145/3542700.3542717","url":null,"abstract":"Estimating ranks, quantiles, and distributions over streaming data is a central task in data analysis and monitoring. Given a stream of n items from a data universe equipped with a total order, the task is to compute a sketch (data structure) of size polylogarithmic in n. Given the sketch and a query item y, one should be able to approximate its rank in the stream, i.e., the number of stream elements smaller than or equal to y.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127496770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Technical Perspective - No PANE, No Gain","authors":"A. Hogan","doi":"10.1145/3542700.3542710","DOIUrl":"https://doi.org/10.1145/3542700.3542710","url":null,"abstract":"The machine learning community has traditionally been proactive in developing techniques for diverse types of data, such as text, audio, images, videos, time series, and, of course, matrices, tensors, etc. \"But what about graphs?\" some of us graph enthusiasts may have asked ourselves, dejectedly, before transforming our beautiful graph into a brutalistic table of numbers that bore little resemblance to its parent, nor the phenomena it represented, but could at least be shovelled into the machine learning frameworks of the time. Thankfully those days are coming to an end.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128565436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}