Companion of the 2023 International Conference on Management of Data最新文献_第5页

Acheron: Persisting Tombstones in LSM Engines Acheron: LSM引擎中的持久墓碑

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589719

Zichen Zhu, Subhadeep Sarkar, Manos Athanassoulis

{"title":"Acheron: Persisting Tombstones in LSM Engines","authors":"Zichen Zhu, Subhadeep Sarkar, Manos Athanassoulis","doi":"10.1145/3555041.3589719","DOIUrl":"https://doi.org/10.1145/3555041.3589719","url":null,"abstract":"Modern NoSQL storage engines frequently employ log-structured merge (LSM) trees as their core data structures because they offer high ingestion rates and low latency for query processing. Client writes are captured in memory first and are gradually merged on disk in a level-wise manner. While this out-of-place paradigm sustains fast ingestion rates, it implements delete operations via inserting tombstones which logically invalidate older entries. Thus, obsolete data cannot be removed instantly and may be retained for an arbitrarily long time. Therefore, out-of-place deletion in LSM trees may, on the one hand, violate data privacy regulations (e.g., the right to be forgotten in EU's GDPR, right to delete in California's CCPA and CPRA), and on the other hand, it hurts performance. In this paper, we develop Acheron, which demonstrates the performance implications of out-of-place deletes and how our method achieves timely persistent deletes. We integrate both prior state-of-the-art compaction policies and our recently presented method, FADE, into Acheron and visualize the life cycle of tombstones in LSM trees. Using the Acheron visualization, users can observe that the state of the art does not provide guarantees on when obsolete entries can be physically removed and also observe that FADE can achieve timely persistent deletes without full tree compaction. Users can further customize the workload, LSM tuning knobs, and disk parameters to investigate their impact on tombstones and performance. This demonstration provides key insights into the impact of tombstones on LSM-interested researchers and practitioners.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126930342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

SCAD: Scalability Advisor for Interactive Microservices on Hybrid Clouds SCAD:混合云上交互式微服务的可伸缩性顾问

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589718

Ka-Ho Chow, Umesh Deshpande, Veera Deenadhayalan, S. Seshadri, Ling Liu

{"title":"SCAD: Scalability Advisor for Interactive Microservices on Hybrid Clouds","authors":"Ka-Ho Chow, Umesh Deshpande, Veera Deenadhayalan, S. Seshadri, Ling Liu","doi":"10.1145/3555041.3589718","DOIUrl":"https://doi.org/10.1145/3555041.3589718","url":null,"abstract":"The microservice architecture allows scaling application components independently based on their resource demands to serve user traffic. The notion of user traffic is critical because it is a mixture of requests to user-facing API endpoints representing valuable semantics (e.g., a customer transaction). Application owners can incorporate business insights to derive the expected user traffic, e.g., for holiday seasons, and rightsize each component to ensure availability and responsiveness. However, existing resource estimation techniques do not take user traffic from application owners into consideration but only rely on historical information, which leads to inaccurate predictions. Furthermore, on-premises infrastructure lacks elasticity, and the overall demands to serve the traffic can exceed its capacity, leaving no room for components to grow. Hybrid clouds provide an attractive solution by offloading some components to the cloud. However, a poor choice to offload can worsen the application in multiple aspects. To address these problems, we introduce SCAD, a scalability advisor for resource management. It estimates resource demands for any user traffic provided by the application owner and recommends how to scale microservices by spanning them on hybrid clouds, optimizing API performance, API availability, and cloud hosting cost.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"69 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130386551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Capturing Data-inherent Dependencies in JSON Schema Extraction 在JSON模式提取中捕获数据固有依赖

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589396

Stefan Klessinger

引用次数: 0

ATENA-PRO: Generating Personalized Exploration Notebooks with Constrained Reinforcement Learning ATENA-PRO:使用约束强化学习生成个性化探索笔记本

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589727

Tavor Lipman, T. Milo, Amit Somech

{"title":"ATENA-PRO: Generating Personalized Exploration Notebooks with Constrained Reinforcement Learning","authors":"Tavor Lipman, T. Milo, Amit Somech","doi":"10.1145/3555041.3589727","DOIUrl":"https://doi.org/10.1145/3555041.3589727","url":null,"abstract":"One of the most common, helpful practices of data scientists, when starting the exploration of a given dataset, is to examine existing data exploration notebooks prepared by other data analysts or scientists. These notebooks contain curated sessions of contextually-related query operations that together demonstrate interesting hypotheses and conjectures on the data. Unfortunately,relevant such notebooks, that had been prepared on the same dataset, and in light of thesame analysis task, are often nonexistent or unavailable. In this work, we describe ATENA-PRO, a framework for auto-generating such relevant, personalized exploratory sessions. Using a novel specification language, users first describe their desired output notebook. Our language contains dedicated constructs for contextually connecting future output queries. These specifications are then used as input for a Deep Reinforcement Learning (DRL) engine, which auto-generates the personalized notebook. Our DRL engine relies on an existing, general-purpose, DRL framework for data exploration. However, augmenting the generic framework with user specifications requires overcoming a difficult sparsity challenge, as only a small portion of the possible sessions may be compliant with the specifications. Inspired by solutions for constrained reinforcement learning, we devise a compound, flexible reward scheme as well as specification-aware neural network architecture. Our experimental evaluation shows that the combination of these components allows ATENA-PRO to consistently generate interesting, personalized exploration sessions for various analysis tasks and datasets.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130669142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Second Data Economy Workshop (DEC) 第二届数据经济工作坊(DEC)

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3590825

G. Koutrika, Nikolaos Laoutaris, Martino Trevisan

{"title":"Second Data Economy Workshop (DEC)","authors":"G. Koutrika, Nikolaos Laoutaris, Martino Trevisan","doi":"10.1145/3555041.3590825","DOIUrl":"https://doi.org/10.1145/3555041.3590825","url":null,"abstract":"Welcome to the second ACM DATA ECONOMY WORKSHOP (DEC), co-located with ACM SIGCMOD 2023. Data-driven decision making through machine learning algorithms (ML) is transforming the way society and the economy work and is having a profound positive impact on our daily lives. With the exception of very large companies that have both the data and the capabilities to develop powerful ML-driven services, the vast majority of demonstrably possible ML services, from e-health to transportation to predictive maintenance, to name a few, still remain at the level of ideas or prototypes for the simple reason that data, the capabilities to manipulate it, and the business models to bring it to market rarely exist under one roof. Data must somehow meet the ML and business skills that can unleash its full power for society and the economy. This has given rise to an extremely dynamic sector around the Data Economy, involving Data Providers/Controllers, data Intermediaries, often-times in the form of Data Marketplaces or Personal Information Management Systems for end users to control and even monetize their personal data. Despite its enormous potential and observed initial growth, the Data Economy is still in its early stages and therefore faces a still uncertain future and a number of existential challenges. These challenges include a wide range of technical issues that affect multiple disciplines of computer science, including networks and distributed systems, security and privacy, machine learning, and human-computer interaction. The mission of the ACM DEC workshop will be to bring together all CS capabilities needed to support the Data Economy. We would like to thank the entire technical program committee for reviewing and selecting papers for the workshop. We hope you will find the papers interesting and stimulating.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129531153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GRADES-NDA'23: 6th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA) 第六届图形数据管理经验与系统(grade)与网络数据分析(NDA)联合研讨会

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3590820

O. Hartig, Yuichi Yoshida

引用次数: 0

A Demonstration of GeoTorchAI: A Spatiotemporal Deep Learning Framework GeoTorchAI的演示:一个时空深度学习框架

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589734

Kanchan Chowdhury, Mohamed Sarwat

{"title":"A Demonstration of GeoTorchAI: A Spatiotemporal Deep Learning Framework","authors":"Kanchan Chowdhury, Mohamed Sarwat","doi":"10.1145/3555041.3589734","DOIUrl":"https://doi.org/10.1145/3555041.3589734","url":null,"abstract":"This paper demonstrates GeoTorchAI, a spatiotemporal deep learning framework. In recent years, many neural network models have been proposed focusing on the applications of raster imagery and spatiotemporal non-imagery datasets. Implementing these models using existing deep learning frameworks, such as PyTorch and TensorFlow, requires nontrivial coding efforts from the developers because these models differ extensively from state-of-the-art models supported by existing deep learning frameworks. Moreover, existing deep learning frameworks lack the support for scalable data preprocessing, a mandatory step for converting spatiotemporal datasets into trainable tensors. GeoTorchAI enables machine learning practitioners to implement spatiotemporal deep learning models with minimum coding efforts on top of PyTorch. It provides state-of-the-art neural network models, ready-to-use benchmark datasets, and transformation operations for raster imagery and spatiotemporal non-imagery datasets. Besides deep learning, GeoTorchAI contains a data preprocessing module that allows preparing trainable spatiotemporal vector datasets and the transformation of raster images in a cluster computing setting.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125069587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TeeBench: Seamless Benchmarking in Trusted Execution Environments TeeBench:可信执行环境中的无缝基准测试

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589726

K. Maliszewski, Tilman Dietzel, Jorge-Arnulfo Quiané-Ruiz, V. Markl

{"title":"TeeBench: Seamless Benchmarking in Trusted Execution Environments","authors":"K. Maliszewski, Tilman Dietzel, Jorge-Arnulfo Quiané-Ruiz, V. Markl","doi":"10.1145/3555041.3589726","DOIUrl":"https://doi.org/10.1145/3555041.3589726","url":null,"abstract":"Trusted Execution Environments (TEEs) have enabled building secure systems that operate on untrusted machines. However, TEEs' architecture questions previous performance findings. The existing relational algorithms have been designed for traditional CPUs. Prior work has shown that these algorithms underperform in TEEs and, in most cases, can not be easily reused. Moreover, they frequently used benchmarks pertinent to CPUs and ignored TEE-specific metrics essential to understand the performance differences. Therefore, there is a need for a fair benchmarking approach for TEE algorithms. In this demonstration, we showcase TeeBench, a unified benchmarking framework for relational operators across TEEs. TeeBench focuses on TEE-specific hardware metrics. It enables a comprehensive performance analysis that helps researchers to evaluate their advances. It comes with an interactive web browser tool that allows the users to upload their implementation of a relational algorithm and seamlessly benchmark it across different TEEs. In addition, it introduces a novel TEE-Analyzer that hints the users about performance bottlenecks and suggests possible code improvements. Users receive instant feedback if changes to their algorithm improve the performance through an interactive, human-friendly web interface. We expect TeeBench to encourage the usage of TEEs and to advance the study of privacy-preserving systems.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123304366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Demonstration of ThalamusDB: Answering Complex SQL Queries with Natural Language Predicates on Multi-Modal Data 演示ThalamusDB:在多模态数据上用自然语言谓词回答复杂SQL查询

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589730

Saehan Jo, Immanuel Trummer

{"title":"Demonstration of ThalamusDB: Answering Complex SQL Queries with Natural Language Predicates on Multi-Modal Data","authors":"Saehan Jo, Immanuel Trummer","doi":"10.1145/3555041.3589730","DOIUrl":"https://doi.org/10.1145/3555041.3589730","url":null,"abstract":"ThalamusDB supports SQL queries with natural language predicates on multi-modal data. Our data model extends the relational model and integrates multi-modal data, including visual, audio, and text data, as columns. Users can write SQL queries including predicates on multi-modal data, described in natural language. In this demonstration, we show how ThalamusDB enables users to query multi-modal data. Visitors can write their own SQL queries on two real-world data sets gathered from Craigslist and YouTube. ThalamusDB has a specialized optimizer that selects execution plans that minimize the overall cost of answering such queries. Query execution involves pre-trained neural models as well as a relational database as processing engines. ThalamusDB collects a limited number of labels for selected data items to translate similarity scores into binary predicate evaluation. Our demonstration enables visitors to compare optimized plans against naive plans in terms of processing latency. ThalamusDB allows users to trade query result precision for reduced processing overheads. Our demonstration interface enables visitors to change the performance objectives and observe their effects on final result precision as well as computation time and number of labeling requests. Similar to online aggregation, our interactive interface allows users to track shrinking error bounds during query execution.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132534164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Table Discovery in Data Lakes: State-of-the-art and Future Directions 数据湖中的表发现:最新技术和未来方向

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI: 10.1145/3555041.3589409

Grace Fan, Jin Wang, Yuliang Li, Renée J. Miller

引用次数: 1