Sigmod RecordPub Date : 2023-10-30DOI: 10.1145/3631504.3631514
Semih Salihoglu
{"title":"Kùzu: A Database Management System For \"Beyond Relational\" Workloads","authors":"Semih Salihoglu","doi":"10.1145/3631504.3631514","DOIUrl":"https://doi.org/10.1145/3631504.3631514","url":null,"abstract":"I would like to share my opinions on the following question: how should a modern graph DBMS (GDBMS) be architected? This is the motivating research question we are addressing in the K`uzu project at University of Waterloo [4, 5].1 I will argue that a modern GDBMS should optimize for a set of what I will call, for lack of a better term, \"beyond relational\" workloads. As a background, let me start with a brief overview of GDBMSs.","PeriodicalId":49524,"journal":{"name":"Sigmod Record","volume":"128 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136107068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sigmod RecordPub Date : 2023-10-30DOI: 10.1145/3631504.3631506
Wei Dong, Ke Yi
{"title":"Query Evaluation under Differential Privacy","authors":"Wei Dong, Ke Yi","doi":"10.1145/3631504.3631506","DOIUrl":"https://doi.org/10.1145/3631504.3631506","url":null,"abstract":"Differential privacy has garnered significant attention in recent years due to its potential in offering robust privacy protection for individual data during analysis. With the increasing volume of sensitive information being collected by organizations and analyzed through SQL queries, the development of a general-purpose query engine that is capable of supporting a broad range of queries while maintaining differential privacy has become the holy grail in privacypreserving query release. Towards this goal, this article surveys recent advances in query evaluation under differential privacy.","PeriodicalId":49524,"journal":{"name":"Sigmod Record","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136107069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sigmod RecordPub Date : 2023-10-30DOI: 10.1145/3631504.3631516
Olga Poppe, Pablo Castro, Willis Lang, Jyoti Leeka
{"title":"Proactive Resource Allocation Policy for Microsoft Azure Cognitive Search","authors":"Olga Poppe, Pablo Castro, Willis Lang, Jyoti Leeka","doi":"10.1145/3631504.3631516","DOIUrl":"https://doi.org/10.1145/3631504.3631516","url":null,"abstract":"Modern cloud services aim to find the middle ground between quality of service and operational cost efficiency by allocating resources if and only if these resources are needed by the customers. Unfortunately, most industrial demand-driven resource allocation approaches are reactive. Given that scaling mechanisms are not instantaneous, the reactive policy may introduce delays to latency-sensitive customer workloads and waste operational costs for cloud service providers. To solve this catch-22, we define the proactive resource allocation policy for Microsoft Azure Cognitive Search. In addition to the current resource demand, the proactive policy takes the typical resource usage patterns into account. We gained the following valuable insights from these patterns over several months of production workloads. One, 87% of the workload is stable due to continuous resource demand. Two, 90% of varying demand is predictable based on a few weeks of historical traces. Three, resources can be reclaimed 52% of the time due to extensive idle intervals of varying workload. Given the size and scope of our analysis, we believe that our approach applies to any latency-sensitive cloud service.","PeriodicalId":49524,"journal":{"name":"Sigmod Record","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136106512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sigmod RecordPub Date : 2023-10-30DOI: 10.1145/3631504.3631508
Danilo B. Seufitelli, Michele A. Brandão, Ayane C. A. Fernandes, Kayque M. Siqueira, Mirella M. Moro
{"title":"Where do Databases and Digital Forensics meet? A Comprehensive Survey and Taxonomy","authors":"Danilo B. Seufitelli, Michele A. Brandão, Ayane C. A. Fernandes, Kayque M. Siqueira, Mirella M. Moro","doi":"10.1145/3631504.3631508","DOIUrl":"https://doi.org/10.1145/3631504.3631508","url":null,"abstract":"We present a systematic literature review and propose a taxonomy for research at the intersection of Digital Forensics and Databases. The merge between these two areas has become more prolific due to the growing volume of data and mobile apps on the Web, and the consequent rise in cyber attacks. Our review has identified 91 relevant papers. The taxonomy categorizes such papers into: Cyber-Attacks (subclasses SQLi, Attack Detection, Data Recovery) and Criminal Intelligence (subclasses Forensic Investigation, Research Products, Crime Resolution). Overall, we contribute to better understanding the intersection between digital forensics and databases, and open opportunities for future research and development with potential for significant social, economic, and technical-scientific contributions.","PeriodicalId":49524,"journal":{"name":"Sigmod Record","volume":"6 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136106500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sigmod RecordPub Date : 2023-10-30DOI: 10.1145/3631504.3631518
Sihem Amer-Yahia, Angela Bonifati, Lei Chen, Guoliang Li, Kyuseok Shim, Jianliang Xu, Xiaochun Yang
{"title":"From Large Language Models to Databases and Back: A Discussion on Research and Education","authors":"Sihem Amer-Yahia, Angela Bonifati, Lei Chen, Guoliang Li, Kyuseok Shim, Jianliang Xu, Xiaochun Yang","doi":"10.1145/3631504.3631518","DOIUrl":"https://doi.org/10.1145/3631504.3631518","url":null,"abstract":"In recent years, large language models (LLMs) have garnered increasing attention from both academia and industry due to their potential to facilitate natural language processing (NLP) and generate highquality text. Despite their benefits, however, the use of LLMs is raising concerns about the reliability of knowledge extraction. The combination of DB research and data science has advanced the state of the art in solving real-world problems, such as merchandise recommendation and hazard prevention [30]. In this discussion, we explore the challenges and opportunities related to LLMs in DB and data science research and education.","PeriodicalId":49524,"journal":{"name":"Sigmod Record","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136106502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sigmod RecordPub Date : 2023-10-30DOI: 10.1145/3631504.3631512
Renata Borovica-Gajic
{"title":"Reminiscences on Influential Papers","authors":"Renata Borovica-Gajic","doi":"10.1145/3631504.3631512","DOIUrl":"https://doi.org/10.1145/3631504.3631512","url":null,"abstract":"This issue's contributors chose papers that address challenges at the heart of database systems: physical design tuning for index selection and transaction isolation levels. Both contributions emphasize the elegant, modular, and long-lasting design choices of the respective work. Enjoy reading!","PeriodicalId":49524,"journal":{"name":"Sigmod Record","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136106511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Apache Wayang: A Unified Data Analytics Framework","authors":"Kaustubh Beedkar, Bertty Contreras-Rojas, Haralampos Gavriilidis, Zoi Kaoudi, Volker Markl, Rodrigo Pardo-Meza, Jorge-Arnulfo Quiané-Ruiz","doi":"10.1145/3631504.3631510","DOIUrl":"https://doi.org/10.1145/3631504.3631510","url":null,"abstract":"The large variety of specialized data processing platforms and the increased complexity of data analytics has led to the need for unifying data analytics within a single framework. Such a framework should free users from the burden of (i) choosing the right platform( s) and (ii) gluing code between the different parts of their pipelines. Apache Wayang (Incubating) is the only open-source framework that provides a systematic solution to unified data analytics by integrating multiple heterogeneous data processing platforms. It achieves that by decoupling applications from the underlying platforms and providing an optimizer so that users do not have to specify the platforms on which their pipeline should run. Wayang provides a unified view and processing model, effectively integrating the hodgepodge of heterogeneous platforms into a single framework with increased usability without sacrificing performance and total cost of ownership. In this paper, we present the architecture ofWayang, describe its main components, and give an outlook on future directions.","PeriodicalId":49524,"journal":{"name":"Sigmod Record","volume":"128 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136107070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sigmod RecordPub Date : 2020-09-04DOI: 10.1145/3422648.3422653
PrasaadGuna, ChandramouliBadrish, KossmannDonald
{"title":"Concurrent Prefix Recovery","authors":"PrasaadGuna, ChandramouliBadrish, KossmannDonald","doi":"10.1145/3422648.3422653","DOIUrl":"https://doi.org/10.1145/3422648.3422653","url":null,"abstract":"This paper proposes a new recovery model based on group commit, called concurrent prefix recovery (CPR). CPR differs from traditional group commit implementations in two ways: (1) it provides a sem...","PeriodicalId":49524,"journal":{"name":"Sigmod Record","volume":"29 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2020-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89632252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implicit Parallelism through Deep Language Embedding","authors":"AlexandrovAlexander, KatsifodimosAsterios, KrastevGeorgi, MarklVolker","doi":"10.1145/2949741.2949754","DOIUrl":"https://doi.org/10.1145/2949741.2949754","url":null,"abstract":"Parallel collection processing based on second-order functions such as map and reduce has been widely adopted for scalable data analysis. Initially popularized by Google, over the past decade this ...","PeriodicalId":49524,"journal":{"name":"Sigmod Record","volume":"1 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2949741.2949754","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"63971524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sigmod RecordPub Date : 2016-06-02DOI: 10.1145/2949741.2949756
Christopher De Sa, Alexander J. Ratner, Christopher Ré, Jaeho Shin, Feiran Wang, Sen Wu, Ce Zhang
{"title":"DeepDive: Declarative Knowledge Base Construction","authors":"Christopher De Sa, Alexander J. Ratner, Christopher Ré, Jaeho Shin, Feiran Wang, Sen Wu, Ce Zhang","doi":"10.1145/2949741.2949756","DOIUrl":"https://doi.org/10.1145/2949741.2949756","url":null,"abstract":"The dark data extraction or knowledge base construction (KBC) problem is to populate a SQL database with information from unstructured data sources including emails, webpages, and pdf reports. KBC is a long-standing problem in industry and research that encompasses problems of data extraction, cleaning, and integration. We describe DeepDive, a system that combines database and machine learning ideas to help develop KBC systems. The key idea in DeepDive is that statistical inference and machine learning are key tools to attack classical data problems in extraction, cleaning, and integration in a unified and more effective manner. DeepDive programs are declarative in that one cannot write probabilistic inference algorithms; instead, one interacts by defining features or rules about the domain. A key reason for this design choice is to enable domain experts to build their own KBC systems. We present the applications, abstractions, and techniques of DeepDive employed to accelerate construction of KBC systems.","PeriodicalId":49524,"journal":{"name":"Sigmod Record","volume":"45 1 1","pages":"60-67"},"PeriodicalIF":1.1,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2949741.2949756","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"63971708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}