P S Pooja, R Greeshma, Emin K Joy, Shreyank P Swamy
{"title":"Auditory Brainstem Response in a Child with Mitochondrial Disorder-Leigh Syndrome.","authors":"P S Pooja, R Greeshma, Emin K Joy, Shreyank P Swamy","doi":"10.1007/s12070-023-03971-3","DOIUrl":"10.1007/s12070-023-03971-3","url":null,"abstract":"<p><p>The mitochondrial disorder-Leigh syndrome is a neurodegenerative disorder often manifested with brainstem abnormalities. The case report highlights the auditory brainstem response in a child with medical findings suggestive of Leigh syndrome. The case report also emphasizes the importance of ruling out any underlying neural pathology before making a clinical impression in children with developmental delays.</p>","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"13 1","pages":"1014-1017"},"PeriodicalIF":0.6,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10909017/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86001700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Breathing New Life into an Old Tree: Resolving Logging Dilemma of B + -tree on Modern Computational Storage Drives","authors":"Kecheng Huang, Zhaoyan Shen, Zili Shao, Tong Zhang, Feng Chen","doi":"10.14778/3626292.3626297","DOIUrl":"https://doi.org/10.14778/3626292.3626297","url":null,"abstract":"Having dominated databases and various data management systems for decades, B + -tree is infamously subject to a logging dilemma: One could improve B + -tree speed performance by equipping it with a larger log, which nevertheless will degrade its crash recovery speed. Such a logging dilemma is particularly prominent in the presence of modern workloads that involve intensive small writes. In this paper, we propose a novel solution, called per-page logging based B + -tree, which leverages the emerging computational storage drive (CSD) with built-in transparent compression to fundamentally resolve the logging dilemma. Our key idea is to divide the large single log into many small (e.g., 4KB), highly compressible per-page logs , each being statically bounded with a B + -tree page. All per-page logs together form a very large over-provisioned log space for B + -tree to improve its operational speed performance. Meanwhile, during crash recovery, B + -tree does not need to scan any per-page logs, leading to a recovery latency independent from the total log size. We have developed and open-sourced a fully functional prototype. Our evaluation results show that, under small-write intensive workloads, our design solution can improve B + -tree operational throughput by up to 625.6% and maintain a crash recovery time of as low as 19.2 ms, while incurring a minimal storage overhead of only 0.5-1.6%.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"24 1","pages":""},"PeriodicalIF":2.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139330821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Cusp: Computing Thrills and Perils and Professional Awakening","authors":"Natasa Milic-Frayling","doi":"10.14778/3611540.3611640","DOIUrl":"https://doi.org/10.14778/3611540.3611640","url":null,"abstract":"Over the past eight decades, computer science has advanced as a field, and the computing profession has matured by establishing professional codes of conduct, fostering best practices, and establishing industry standards to support the proliferation of technologies and services. Research and applications of digital computation continue to change all aspects of human endeavor through new waves of innovation. While it is clear that different research advances fuel innovation, the ways they come together to make an impact vary. In contrast to highly regulated sectors such as pharma, medicine and law, the process of transforming research into widely deployed technologies is not regulated. We reflect on collective practices, from discovery by scientists and engineers to market delivery by entrepreneurs, industry leaders, and practitioners. We consider ecosystem changes that are required to sustain the transformational effects of new technologies and enable new practices to take root. Every such transformation ruptures in the existing socio-technical fabric and requires a concerted effort to remedy this through effective policies and regulations. Computing experts are involved in all phases and must match the transformational power of their innovation with the highest standard of professional conduct. We highlight the principles of responsible innovation and discuss three waves of digital innovation. We use wide and uncontrolled generative AI deployments to illustrate risks from the implosion of digital media due to contamination of digital records, removal of human agency, and risk to an individual's personhood.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134996880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lingua Manga : A Generic Large Language Model Centric System for Data Curation","authors":"Zui Chen, Lei Cao, Sam Madden","doi":"10.14778/3611540.3611624","DOIUrl":"https://doi.org/10.14778/3611540.3611624","url":null,"abstract":"Data curation is a wide-ranging area which contains many critical but time-consuming data processing tasks. However, the diversity of such tasks makes it challenging to develop a general-purpose data curation system. To address this issue, we present Lingua Manga, a user-friendly and versatile system that utilizes pre-trained large language models. Lingua Manga offers automatic optimization for achieving high performance and label efficiency while facilitating flexible and rapid development. Through three example applications with distinct objectives and users of varying levels of technical proficiency, we demonstrate that Lingua Manga can effectively assist both skilled programmers and low-code or even no-code users in addressing data curation challenges.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134997925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Story of GraphLab - From Scaling Machine Learning to Shaping Graph Systems Research (VLDB 2023 Test-of-Time Award Talk)","authors":"Joseph E. Gonzalez, Yucheng Low","doi":"10.14778/3611540.3611637","DOIUrl":"https://doi.org/10.14778/3611540.3611637","url":null,"abstract":"The GraphLab project spanned almost a decade and had profound academic and industrial impact on large-scale machine learning and graph processing systems. There were numerous papers written describing the innovations in GraphLab including the original vertex-centric [8] and edge-centric [3] programming abstractions, high-performance asynchronous execution engines [9], out-of-core graph computation [6], tabular graph-systems [4], and even new statistical inference algorithms [2] enabled by the GraphLab project. This work became the basis of multiple PhD theses [1, 5, 7]. The GraphLab open-source project had broad academic and industrial adoption and ultimately lead to the launch of Turi. In this talk, we tell the story of GraphLab, how it began and the key ideas behind it. We will focus on the approach to achieving scalable asynchronous systems in machine learning. During our talk, we will explore the impact that GraphLab has had on the development of graph processing systems, graph databases, and AI/ML; Additionally, we will share our insights and opinions into where we see the future of these fields heading. In the process, we highlight some of the lessons we learned and provide guidance for future students.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134998137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CM-Explorer: Dissecting Data Ingestion Problems","authors":"Niels Bylois, Frank Neven, Stijn Vansummeren","doi":"10.14778/3611540.3611595","DOIUrl":"https://doi.org/10.14778/3611540.3611595","url":null,"abstract":"Data ingestion validation, the task of certifying the quality of continuously collected data, is crucial to ensure trustworthiness of analytics insights. A widely used approach for validating data quality is to specify, either manually or automatically, so-called data unit tests that check whether data quality metrics lie within expected bounds. We employ conditional unit tests based on conditional metrics (CMs) that compute data quality signals over specific parts of the ingestion data and therefore allow for a fine-grained detection of errors. A violated conditional unit test specifies a set of erroneous tuples in a natural way: the subrelation that its CM refers to. Unfortunately, the downside of their fine-grained nature is that violating unit tests are often correlated: a single error in an ingestion batch may cause multiple tests (each referring to different parts of the batch) to fail. The key challenge is therefore to untangle this correlation and filter out the most relevant violated conditional unit tests, i.e., tests that identify a core set of erroneous tuples and act as an explanation for the errors. We present CM-Explorer, a system that supports data stewards in quickly finding the most relevant violated conditional unit tests. The system consists of three components: (1) a graph explorer for visualizing the correlation structure of the violated unit tests; (2) a relation explorer for browsing the tuples selected by conditional unit tests; and, (3) a history explorer to get insight why conditional unit tests are violated. In this paper, we discuss these components and present the different scenarios that we make available for the demonstration.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134998138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alberto Lerner, Carsten Binnig, Philippe Cudré-Mauroux, Rana Hussein, Matthias Jasny, Theo Jepsen, Dan R. K. Ports, Lasse Thostrup, Tobias Ziegler
{"title":"Databases on Modern Networks: A Decade of Research That Now Comes into Practice","authors":"Alberto Lerner, Carsten Binnig, Philippe Cudré-Mauroux, Rana Hussein, Matthias Jasny, Theo Jepsen, Dan R. K. Ports, Lasse Thostrup, Tobias Ziegler","doi":"10.14778/3611540.3611579","DOIUrl":"https://doi.org/10.14778/3611540.3611579","url":null,"abstract":"Modern cloud networks are a fundamental pillar of data-intensive applications. They provide high-speed transaction (packet) rates and low overhead, enabling, for instance, truly scalable database designs. These networks, however, are fundamentally different from conventional ones. Arguably, the two key discerning technologies are RDMA and programmable network devices. Today, these technologies are not niche technologies anymore and are widely deployed across all major cloud vendors. The question is thus not if but how a new breed of data-intensive applications can benefit from modern networks, given the perceived difficulty in using and programming them. This tutorial addresses these challenges by exposing how the underlying principles changed as the network evolved and by presenting the new system design opportunities they opened. In the process, we also discuss several hard-earned lessons accumulated by making the transition first-hand.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134998305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Srinivasan, Andrew Gooding, Sunil Sayyaparaju, Thomas Lopatic, Kevin Porter, Ashish Shinde, B. Narendran
{"title":"Techniques and Efficiencies from Building a Real-Time DBMS","authors":"V. Srinivasan, Andrew Gooding, Sunil Sayyaparaju, Thomas Lopatic, Kevin Porter, Ashish Shinde, B. Narendran","doi":"10.14778/3611540.3611556","DOIUrl":"https://doi.org/10.14778/3611540.3611556","url":null,"abstract":"This paper describes a variety of techniques from over a decade of developing Aerospike (formerly Citrusleaf), a real-time DBMS that is being used in some of the world's largest mission-critical systems that require the highest levels of performance and availability. Such mission-critical systems have many requirements including the ability to make decisions within a strict real-time SLA (milliseconds) with no downtime, predictable performance so that the first and billionth customer gets the same experience, ability to scale up 10X (or even 100X) with no downtime, support strong consistency for applications that need it, synchronous and asynchronous replication with global transactional capabilities, and the ability to deploy in any public and private cloud environments. We describe how using efficient algorithms to optimize every area of the DBMS helps the system achieve these stringent requirements. Specifically, we describe, effective ways to shard, place and locate data across a set of nodes, efficient identification of cluster membership and cluster changes, efficiencies generated by using a 'smart' client, how to effectively use replications with two copies replication instead of three-copy, how to reduce the cost of the realtime data footprint by combining the use of memory with flash storage, self-managing clusters for ease of operation including elastic scaling, networking and CPU optimizations including NUMA pinning with multi-threading. The techniques and efficiencies described here have enabled hundreds of deployments to grow by many orders of magnitude with near complete uptime.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135002981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ju Hyoung Mun, Konstantinos Karatsenidis, Tarikul Islam Papon, Shahin Roozkhosh, Denis Hoornaert, Ulrich Drepper, Ahmed Sanaullah, Renato Mancuso, Manos Athanassoulis
{"title":"On-the-Fly Data Transformation in Action","authors":"Ju Hyoung Mun, Konstantinos Karatsenidis, Tarikul Islam Papon, Shahin Roozkhosh, Denis Hoornaert, Ulrich Drepper, Ahmed Sanaullah, Renato Mancuso, Manos Athanassoulis","doi":"10.14778/3611540.3611593","DOIUrl":"https://doi.org/10.14778/3611540.3611593","url":null,"abstract":"Transactional and analytical database management systems (DBMS) typically employ different data layouts: row-stores for the first and column-stores for the latter. In order to bridge the requirements of the two without maintaining two systems and two (or more) copies of the data, our proposed system Relational Memory employs specialized hardware that transforms the base row table into arbitrary column groups at query execution time. This approach maximizes the cache locality and is easy to use via a simple abstraction that allows transparent on-the-fly data transformation. Here, we demonstrate how to deploy and use Relational Memory via four representative scenarios. The demonstration uses the full-stack implementation of Relational Memory on the Xilinx Zynq UltraScale+ MPSoC platform. Conference participants will interact with Relational Memory deployed in the actual platform.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135003649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junichi Tatemura, Tao Zou, Jagan Sankaranarayanan, Yanlai Huang, Jim Chen, Yupu Zhang, Kevin Lai, Hao Zhang, Gokul Nath Babu Manoharan, Goetz Graefe, Divyakant Agrawal, Brad Adelberg, Shilpa Kolhar, Indrajit Roy
{"title":"Progressive Partitioning for Parallelized Query Execution in Google's Napa","authors":"Junichi Tatemura, Tao Zou, Jagan Sankaranarayanan, Yanlai Huang, Jim Chen, Yupu Zhang, Kevin Lai, Hao Zhang, Gokul Nath Babu Manoharan, Goetz Graefe, Divyakant Agrawal, Brad Adelberg, Shilpa Kolhar, Indrajit Roy","doi":"10.14778/3611540.3611541","DOIUrl":"https://doi.org/10.14778/3611540.3611541","url":null,"abstract":"Napa holds Google's critical data warehouses in log-structured merge trees for real-time data ingestion and sub-second response for billions of queries per day. These queries are often multi-key look-ups in highly skewed tables and indexes. In our production experience, only progressive query-specific partitioning can achieve Napa's strict query latency SLOs. Here we advocate good-enough partitioning that keeps the per-query partitioning time low without risking uneven work distribution. Our design combines pragmatic system choices and algorithmic innovations. For instance, B-trees are augmented with statistics of key distributions, thus serving the dual purpose of aiding lookups and partitioning. Furthermore, progressive partitioning is designed to be \"good enough\" thereby balancing partitioning time with performance. The resulting system is robust and successfully serves day-in-day-out billions of queries with very high quality of service forming a core infrastructure at Google.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135003653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}