Arun C. S. Kumar, Alin Deutsch, Amarnath Gupta, Y. Papakonstantinou, Babak Salimi, V. Vianu
{"title":"Database Education at UC San Diego","authors":"Arun C. S. Kumar, Alin Deutsch, Amarnath Gupta, Y. Papakonstantinou, Babak Salimi, V. Vianu","doi":"10.1145/3572751.3572763","DOIUrl":"https://doi.org/10.1145/3572751.3572763","url":null,"abstract":"We are in the golden age of data-intensive computing. CS is now the largest major in most US universities. Data Science, ML/AI, and cloud computing have been growing rapidly. Many new data-centric job categories are taking shape in industry, e.g., data scientists, ML engineers, analytics engineers, and data associates. The DB/data management/data systems area is naturally a central part of all these transformations. Thus, the DB community must keep evolving and innovating to fulfill the need for DB education in all its facets, including its intersection with other areas such as ML, systems, HCI, various domain sciences, etc., as well as bridging the gap with practice and industry.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134329639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Revisiting Online Data Markets in 2022","authors":"Javen Kennedy, Pranav Subramaniam, Sainyam Galhotra, Raul Castro Fernandez","doi":"10.1145/3572751.3572757","DOIUrl":"https://doi.org/10.1145/3572751.3572757","url":null,"abstract":"Well-functioning data markets match sellers with buyers to allocate data effectively. Although most of today's data markets fall short of this ideal, there is a renewed interest in online data marketplaces that may fulfill the promise of data markets. In this paper, we survey participants of some of the most common data marketplaces to understand the platforms' upsides and downsides. We find that buyers and sellers spend the majority of their time and effort in price negotiations. Although the markets work as an effective storefront that lets buyers find useful data fast, the high transaction costs required to negotiate price and circumvent the information asymmetry that exists between buyers and sellers indicates that today's marketplaces are still far from offering an effective solution to data trading. We draw on the results of the interviews to present potential opportunities for improvement and future research.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134163839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Characterizing I/O in Machine Learning with MLPerf Storage","authors":"Oana Balmau","doi":"10.1145/3572751.3572765","DOIUrl":"https://doi.org/10.1145/3572751.3572765","url":null,"abstract":"Data is the driving force behind machine learning (ML) algorithms. The way we ingest, store, and serve data can impact the performance of end-to-end training and inference significantly [11]. However, efficient storage and pre-processing of training data has received far less focus in ML compared to efforts in building specialized software frameworks and hardware accelerators. The amount of data that we produce is growing exponentially, making it expensive and difficult to keep entire training datasets in main memory. Increasingly, ML algorithms will need to access data from persistent storage in an efficient way.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124004908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Management of Implicit Requirements Data in Large SRS Documents","authors":"Dev Dave, Angeli Celestino, A. Varde, Vaibhav Anu","doi":"10.1145/3552490.3552494","DOIUrl":"https://doi.org/10.1145/3552490.3552494","url":null,"abstract":"Implicit Requirements (IMR) identification is part of the Requirements Engineering (RE) phase in Software Engineering during which data is gathered to create SRS (Software Requirements Specifications) documents. As opposed to explicit requirements clearly stated, IMRs constitute subtle data and need to be inferred. Research has shown that IMRs are crucial to the success of software development. Many software systems can encounter failures due to lack of IMR data management. SRS documents are large, often hundreds of pages, due to which manually identifying IMRs by human software engineers is not feasible. Moreover, such data is evergrowing due to the expansion of software systems. It is thus important to address the crucial issue of IMR data management. This article presents a survey on IMRs in SRS documents with the definition and overview of IMR data, detailed taxonomy of IMRs with explanation and examples, practices in managing IMR data, and tools for IMR identification. In addition to reviewing classical and state-of-the-art approaches, we highlight trends and challenges and point out open issues for future research. This survey article is interesting based on data quality, hidden information retrieval, veracity and salience, and knowledge discovery from large textual documents with complex heterogeneous data.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129075920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dhrubajyoti Ghosh, Peeyush Gupta, S. Mehrotra, Shantanu Sharma
{"title":"A Case for Enrichment in Data Management Systems","authors":"Dhrubajyoti Ghosh, Peeyush Gupta, S. Mehrotra, Shantanu Sharma","doi":"10.1145/3552490.3552497","DOIUrl":"https://doi.org/10.1145/3552490.3552497","url":null,"abstract":"We describe ENRICHDB, a new DBMS technology designed for emerging domains (e.g., sensor-driven smart spaces and social media analytics) that require incoming data to be enriched using expensive functions prior to its usage. To support online processing, today, such enrichment is performed outside of DBMSs, as a static data processing workflow prior to its ingestion into a DBMS. Such a strategy could result in a significant delay from the time when data arrives and when it is enriched and ingested into the DBMS, especially when the enrichment complexity is high. Also, enriching at ingestion could result in wastage of resources if applications do not use/require all data to be enriched. ENRICHDB's design represents a significant departure from the above, where we explore seamless integration of data enrichment all through the data processing pipeline - at ingestion, triggered based on events in the background, and progressively during query processing. The cornerstone of ENRICHDB is a powerful enrichment data and query model that encapsulates enrichment as an operator inside a DBMS enabling it to co-optimize enrichment with query processing. This paper describes this data model and provides a summary of the system implementation.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123699103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mahmoud Abo Khamis, RelationalAI, H. Ngo, R. Pichler, T. Wien, Dan Suciu
{"title":"Datalog in Wonderland","authors":"Mahmoud Abo Khamis, RelationalAI, H. Ngo, R. Pichler, T. Wien, Dan Suciu","doi":"10.1145/3552490.3552492","DOIUrl":"https://doi.org/10.1145/3552490.3552492","url":null,"abstract":"Modern data analytics applications, such as knowledge graph reasoning and machine learning, typically involve recursion through aggregation. Such computations pose great challenges to both system builders and theoreticians: first, to derive simple yet powerful abstractions for these computations; second, to define and study the semantics for the abstractions; third, to devise optimization techniques for these computations. In recent work we presented a generalization of Datalog called Datalog, which addresses these challenges. Datalog is a simple abstraction, which allows aggregates to be interleaved with recursion, and retains much of the simplicity and elegance of Datalog. We define its formal semantics based on an algebraic structure called Partially Ordered Pre-Semirings, and illustrate through several examples how Datalog can be used for a variety of applications. Finally, we describe a new optimization rule for Datalog, called the FGH-rule, then illustrate the FGH-rule on several examples, including a simple magic-set rewriting, generalized semi-naïve evaluation, and a bill-of-material example, and briefly discuss the implementation of the FGH-rule and present some experimental validation of its effectiveness.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122118382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Amer-Yahia, Yael Amsterdamer, S. Bhowmick, A. Bonifati, Philippe Bonnet, Renata Borovica-Gajic, Barbara Catania, T. Cerquitelli, S. Chiusano, Panos K. Chrysanthis, C. Curino, J. Darmont, A. El Abbadi, A. Floratou, Juliana Freire, Alekh Jindal, V. Kalogeraki, G. Koutrika, Arun Kumar, Sujaya Maiyya, A. Meliou, Madhulika Mohanty, Felix Naumann, N. Noack, Fatma Özcan, L. Peterfreund, W. Rahayu, Wang-Chiew Tan, Yuan Tian, Pınar Tözün, Genoveva Vargas-Solar, N. Yadwadkar, Meihui Zhang
{"title":"Diversity and Inclusion Activities in Database Conferences","authors":"S. Amer-Yahia, Yael Amsterdamer, S. Bhowmick, A. Bonifati, Philippe Bonnet, Renata Borovica-Gajic, Barbara Catania, T. Cerquitelli, S. Chiusano, Panos K. Chrysanthis, C. Curino, J. Darmont, A. El Abbadi, A. Floratou, Juliana Freire, Alekh Jindal, V. Kalogeraki, G. Koutrika, Arun Kumar, Sujaya Maiyya, A. Meliou, Madhulika Mohanty, Felix Naumann, N. Noack, Fatma Özcan, L. Peterfreund, W. Rahayu, Wang-Chiew Tan, Yuan Tian, Pınar Tözün, Genoveva Vargas-Solar, N. Yadwadkar, Meihui Zhang","doi":"10.1145/3552490.3552510","DOIUrl":"https://doi.org/10.1145/3552490.3552510","url":null,"abstract":"Diversity and Inclusion (D&I) are core to fostering innovative thinking. Existing theories demonstrate that to facilitate inclusion, multiple types of exclusionary dynamics, such as self-segregation, communication apprehension, and stereotyping and stigmatizing, must be overcome [11]. A diverse group of people tends to surface different perspectives, which help to understand and address D&I. Fostering D&I in research communities must address issues related to inclusive interpersonal and small group dynamics, rules and codes of conduct, increasing diversity in under-represented groups and disciplines, and organizing D&I events, and longterm efforts to champion change [15].","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133775972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Teaching about Data and Databases","authors":"A. Fekete, Uwe Röhm","doi":"10.1145/3552490.3552504","DOIUrl":"https://doi.org/10.1145/3552490.3552504","url":null,"abstract":"The panel on data(base) education at VLDB2021 [13] drew attention to important challenges in choosing how database classes are constructed for students in a world where data is being used in novel and impactful settings. This paper aims to present one view of a process for making these pedagogy decisions. We don't aim to present a best-possible design of the subject, rather we want to illuminate the space of possibilities, to encourage reasoned choices rather than simply teaching the subject as it was previously offered, or spending time on the latest innovations without considering the \"opportunity cost\" of doing so. We hope to guide the perplexed instructor or departmental curriculum committee.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133614410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manos Athanassoulis, P. Triantafillou, Raja Appuswamy, R. Bordawekar, B. Chandramouli, Xuntao Cheng, I. Manolescu, Y. Papakonstantinou, Nesime Tatbul
{"title":"Artifacts Availability & Reproducibility (VLDB 2021 Round Table)","authors":"Manos Athanassoulis, P. Triantafillou, Raja Appuswamy, R. Bordawekar, B. Chandramouli, Xuntao Cheng, I. Manolescu, Y. Papakonstantinou, Nesime Tatbul","doi":"10.1145/3552490.3552511","DOIUrl":"https://doi.org/10.1145/3552490.3552511","url":null,"abstract":"In the last few years, SIGMOD and VLDB have intensified efforts to encourage, facilitate, and establish reproducibility as a key process for accepted research papers, awarding them with the Reproducibility badge. In addition, complementary efforts have focused on increasing the sharing of accompanying artifacts of published work (code, scripts, data), independently of reproducibility, awarding them the Artifacts Available badge. In this short note, we summarize the discussion of a panel held during VLDB 2021 titled \"Artifacts, Availability & Reproducibility\". We first present a more detailed summary of the recent efforts. Then, we present the discussion and the contributed key points that were made, aiming to assess the reproducibility of data management research and to propose changes moving forward.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126336461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reynold Cheng, Chenhao Ma, Xiaodong Li, Yixiang Fang, Ye Liu, Victor Y.L. Wong, Esther Lee, T. Lam, S. Ho, M. Wang, Weijie Gong, Wentao Ning, B. Kao
{"title":"The Social Technology and Research (STAR) Lab in the University of Hong Kong","authors":"Reynold Cheng, Chenhao Ma, Xiaodong Li, Yixiang Fang, Ye Liu, Victor Y.L. Wong, Esther Lee, T. Lam, S. Ho, M. Wang, Weijie Gong, Wentao Ning, B. Kao","doi":"10.1145/3552490.3552508","DOIUrl":"https://doi.org/10.1145/3552490.3552508","url":null,"abstract":"The main goal of the Social Technology and Research Laboratory (STAR Lab) in the University of Hong Kong (https://star.hku.hk) is to develop novel IT technologies for serving the society. Our team has more than three years of experience in project development, web, app, and game design, photography, and video production. We are interested in?Data Science for Social Good\", researching data-driven approaches that can benefit the public, NGOs, and the government.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117040015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}