{"title":"DEI Perspectives in Information Technology Education","authors":"S. Sadiq","doi":"10.1145/3555041.3591053","DOIUrl":"https://doi.org/10.1145/3555041.3591053","url":null,"abstract":"Information Technology (IT) has become deeply intertwined with business and society across many, if not all sectors of the economy. The accelerated pace of development has resulted in regulatory frameworks and societal expectations lagging behind the design and use of IT artefacts. Education plays a fundamental role in ensuring that advancements in the field create benefits for all parts of society. To achieve societal benefits and mitigate potential disadvantage, it is imperative that Diversity, Equity and Inclusion (DEI) perspectives are embedded in the design and delivery of IT education. However, there are growing indications that the needs of learner populations have been shifting, and educational systems are struggling to adapt. The scale and diversity of the learner population has increased multi-fold, and at the same time shifts in learner expectations, have led to a rapid growth of new learning opportunities, e.g., through short online credentials, and community-based discussion forums. These challenges have initiated calls for urgent action for the educational landscape to evolve [1]. In this talk I will discuss some of these challenges. I will also share experiences and strategies for embedding DEI perspectives in (1) education of IT/CS including implications for curriculums; (2) education for future work environments that are inseparable from IT; and (3) education with technology enhanced education platforms and tools, including how the data management community can play a prominent role in the burgeoning Educational Technology (EdTech) and Artificial Intelligence in Education (AIEd) [2] research and development.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122856247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dexer: Detecting and Explaining Biased Representation in Ranking","authors":"Y. Moskovitch, Jinyang Li, H. Jagadish","doi":"10.1145/3555041.3589725","DOIUrl":"https://doi.org/10.1145/3555041.3589725","url":null,"abstract":"With the growing use of ranking algorithms in real-life decision-making purposes, fairness in ranking has been recognized as an important issue. Recent works have studied different fairness measures in ranking, and many of them consider the representation of different \"protected groups\", in the top-k ranked items, for any reasonable k. Given the protected groups, confirming algorithmic fairness is a simple task. However, the groups' definitions may be unknown in advance. To this end, we present Dexer, a system for the detection of groups with biased representation in the top-k. Dexer utilizes the notion of Shapley values to provide the users with visual explanations for the cause of bias. We will demonstrate the usefulness of Dexer using real-life data.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131212539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tobias Winker, Sven Groppe, Valter Uotila, Zhengtong Yan, Jiaheng Lu, Maja Franz, W. Mauerer
{"title":"Quantum Machine Learning: Foundation, New Techniques, and Opportunities for Database Research","authors":"Tobias Winker, Sven Groppe, Valter Uotila, Zhengtong Yan, Jiaheng Lu, Maja Franz, W. Mauerer","doi":"10.1145/3555041.3589404","DOIUrl":"https://doi.org/10.1145/3555041.3589404","url":null,"abstract":"In the last few years, the field of quantum computing has experienced remarkable progress. The prototypes of quantum computers already exist and have been made available to users through cloud services (e.g., IBM Q experience, Google quantum AI, or Xanadu quantum cloud). While fault-tolerant and large-scale quantum computers are not available yet (and may not be for a long time, if ever), the potential of this new technology is undeniable. Quantum algorithms have the proven ability to either outperform classical approaches for several tasks, or are impossible to be efficiently simulated by classical means under reasonable complexity-theoretic assumptions. Even imperfect current-day technology is speculated to exhibit computational advantages over classical systems. Recent research is using quantum computers to solve machine learning tasks. Meanwhile, the database community has already successfully applied various machine learning algorithms for data management tasks, so combining the fields seems to be a promising endeavour. However, quantum machine learning is a new research field for most database researchers. In this tutorial, we provide a fundamental introduction to quantum computing and quantum machine learning and show the potential benefits and applications for database research. In addition, we demonstrate how to apply quantum machine learning to the join order optimization problem in databases.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122026007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Overview of Reachability Indexes on Graphs","authors":"Chao Zhang, A. Bonifati, M. Tamer Özsu","doi":"10.1145/3555041.3589408","DOIUrl":"https://doi.org/10.1145/3555041.3589408","url":null,"abstract":"Graphs have been the natural choice for modeling entities and the relationships among them. One of the most fundamental graph processing operators is a reachability query, which checks whether a path exists from the source to the target vertex in a plain graph, and additionally whether the path can satisfy a given path constraint based on the edge labels in an edge-labeled graph. Processing reachability queries requires potentially visiting a large portion of the graph due to the inherent transitivity of these queries. This makes it costly to evaluate them on large graphs. Thus, significant effort has been spent to design indexing techniques for reachability queries in the last three decades, building advanced data structures to efficiently compress the transitive closure of the graph so as to accelerate online query processing, aka reachability indexes. In this tutorial, we provide an in-depth technical review of the existing reachability indexes, ranging from those designed for plain graphs to ones for edge-labeled graphs. We conclude the tutorial by summarizing the open challenges for integrating these techniques into GDBMSs.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128567567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sebastian Schelter, Stefan Grafberger, Shubha Guha, Bojan Karlas, Ce Zhang
{"title":"Proactively Screening Machine Learning Pipelines with ARGUSEYES","authors":"Sebastian Schelter, Stefan Grafberger, Shubha Guha, Bojan Karlas, Ce Zhang","doi":"10.1145/3555041.3589682","DOIUrl":"https://doi.org/10.1145/3555041.3589682","url":null,"abstract":"Software systems that learn from data with machine learning (ML) are ubiquitous. ML pipelines in these applications often suffer from a variety of data-related issues, such as data leakage, label errors or fairness violations, which require reasoning about complex dependencies between their inputs and outputs. These issues are usually only detected in hindsight after deployment, after they caused harm in production. We demonstrate ArgusEyes, a system which enables data scientists to proactively screen their ML pipelines for data-related issues as part of continuous integration. ArgusEyes instruments, executes and screens ML pipelines for declaratively specified pipeline issues, and analyzes data artifacts and their provenance to catch potential problems early before deployment to production. We demonstrate our system for three scenarios: detecting mislabeled images in a computer vision pipeline, spotting data leakage in a price prediction pipeline, and addressing fairness violations in a credit scoring pipeline.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128437547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"First Workshop on Verifiable Database Systems","authors":"Tien Tuan Anh Dinh, B. Ooi, Xinying Yang","doi":"10.1145/3555041.3590824","DOIUrl":"https://doi.org/10.1145/3555041.3590824","url":null,"abstract":"Verifiable database systems ensure strong integrity guarantee, that is, the database operations are executed correctly over untampered data. While general-purpose verifiable computation techniques do exist, they suffer from poor performance. Therefore, a practical verifiable database system must make trade-offs between security, performance, and functionalities. This workshop brings together researchers and engineers from academia and industry to discuss ideas and techniques for building such practical systems. The main goals include identifying new abstractions, applications, challenges and solutions related to verifiable database systems.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127440307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yiwen Zhu, Yuanyuan Tian, Joyce Cahoon, Subru Krishnan, A. Agarwal, R. Alotaibi, Jesús Camacho-Rodríguez, Bibin Chundatt, Andrew Chung, Niharika Dutta, Andrew Fogarty, Anja Gruenheid, Brandon Haynes, Matteo Interlandi, Minu Iyer, Nick Jurgens, Sumeet Khushalani, Brian Kroth, M. Kumar, Jyoti Leeka, Sergiy Matusevych, Minni Mittal, A. Mueller, Kartheek Muthyala, Harsha Nagulapalli, Yoonjae Park, Hiren Patel, Anna Pavlenko, Olga Poppe, Santhosh Ravindran, Karla Saur, Rathijit Sen, Steve Suh, Arijit Tarafdar, Kunal Waghray, Demin Wang, C. Curino, R. Ramakrishnan
{"title":"Towards Building Autonomous Data Services on Azure","authors":"Yiwen Zhu, Yuanyuan Tian, Joyce Cahoon, Subru Krishnan, A. Agarwal, R. Alotaibi, Jesús Camacho-Rodríguez, Bibin Chundatt, Andrew Chung, Niharika Dutta, Andrew Fogarty, Anja Gruenheid, Brandon Haynes, Matteo Interlandi, Minu Iyer, Nick Jurgens, Sumeet Khushalani, Brian Kroth, M. Kumar, Jyoti Leeka, Sergiy Matusevych, Minni Mittal, A. Mueller, Kartheek Muthyala, Harsha Nagulapalli, Yoonjae Park, Hiren Patel, Anna Pavlenko, Olga Poppe, Santhosh Ravindran, Karla Saur, Rathijit Sen, Steve Suh, Arijit Tarafdar, Kunal Waghray, Demin Wang, C. Curino, R. Ramakrishnan","doi":"10.1145/3555041.3589674","DOIUrl":"https://doi.org/10.1145/3555041.3589674","url":null,"abstract":"Modern cloud has turned data services into easily accessible commodities. With just a few clicks, users are now able to access a catalog of data processing systems for a wide range of tasks. How- ever, the cloud brings in both complexity and opportunity. While cloud users can quickly start an application by using various data services, it can be difficult to configure and optimize these services to gain the most value from them. For cloud providers, managing every aspect of an ever-increasing set of data services, while meeting customer SLAs and minimizing operational cost is becoming more challenging. Cloud technology enables the collection of significant amounts of workload traces and system telemetry. With the progress in data science (DS) and machine learning (ML), it is feasible and desirable to utilize a data-driven, ML-based approach to automate various aspects of data services, resulting in the creation of autonomous data services. This paper presents our perspectives and insights on creating autonomous data services on Azure. It also covers the future endeavors we plan to undertake and unresolved issues that still need attention.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127994385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Workshop on Simplicity in Management of Data (SiMoD)","authors":"Danica Porobic, Tianzheng Wang","doi":"10.1145/3555041.3590817","DOIUrl":"https://doi.org/10.1145/3555041.3590817","url":null,"abstract":"At a first glance, database systems today are complex, with various components, tuning knobs and delicate design decisions, leading to hundreds of thousands lines of code (if not more). However, at their core - as researchers and practitioners have been observing at least anecdotally - are simple ideas that work well in practice. Meanwhile, it also often takes a tremendous amount of experience to propose such ideas that are simple but not trivial. SiMoD is a new workshop dedicated to promoting and documenting such ideas as they are often \"buried\" in details as part of a full paper or product, or only shared anecdotally among experienced practitioners, creating barriers for newcomers to the field. The workshop will also be a great venue for junior researchers and PhD students to learn about the importance of simple but effective ideas, and get feedback about their on-going work.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126616768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Bordawekar, O. Shmueli, Yael Amsterdamer, D. Firmani, Andreas Kipf
{"title":"Sixth International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiDM)","authors":"R. Bordawekar, O. Shmueli, Yael Amsterdamer, D. Firmani, Andreas Kipf","doi":"10.1145/3555041.3590818","DOIUrl":"https://doi.org/10.1145/3555041.3590818","url":null,"abstract":"Recent advances in AI techniques, as well as enabling hardware and infrastructure, has led to the integration of AI in wide-ranging domains and tasks. In particular, AI has been used to handle various types of data (including numerical, textual and image data) and has been adopted in large-scale distributed systems. From a data management perspective, this calls for the harnessing of state-of-the-art AI solutions for data management tasks and systems. aiDM is a full-day workshop that offers a stage for innovative interdisciplinary research that studies the interaction between AI and data management and develops new AI technologies for data-related tasks. This year, aiDM'23 particularly focuses on the transparent exploitation of AI techniques in existing enterprise-level data management workloads.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116224273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Third International Workshop on Big Data in Emergent Distributed Environments (BiDEDE)","authors":"Sven Groppe, L. Gruenwald, Chinghao Hsu","doi":"10.1145/3555041.3590821","DOIUrl":"https://doi.org/10.1145/3555041.3590821","url":null,"abstract":"The Third International Workshop on Big Data in Emergent Distributed Environments (BiDEDE) is centered around addressing scalable data management issues in emerging computing environments such as (post) cloud and fog/edge/dew computing. These environments aim to incorporate efficient data management and processing into distributed systems to minimize communication and computational costs, while simultaneously enhancing application throughput, reducing latencies, and prolonging battery life for nodes. Despite over a decade of research in this area, there are still numerous challenges that remain unsolved due to technological advancements such as lightweight virtualization, greater node capabilities, and increased parallelization. This year we also welcome contributions on data management for or solved by quantum computing. The workshop provides a platform for active discussions in these and related topics.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115860984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}