Proceedings of the 2021 International Conference on Management of Data最新文献_第2页

Efficient Approximate Algorithms for Empirical Entropy and Mutual Information 经验熵和互信息的有效近似算法

Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3457255

Xingguang Chen, Sibo Wang

{"title":"Efficient Approximate Algorithms for Empirical Entropy and Mutual Information","authors":"Xingguang Chen, Sibo Wang","doi":"10.1145/3448016.3457255","DOIUrl":"https://doi.org/10.1145/3448016.3457255","url":null,"abstract":"Empirical entropy is a classic concept in data mining and the foundation of many other important concepts like mutual information. However, computing the exact empirical entropy/mutual information on large datasets can be expensive. Some recent research work explores sampling techniques on the empirical entropy/mutual information to speed up the top-k and filtering queries. However, their solution still aims to return the exact answers to the queries, resulting in high computational costs. Motivated by this, in this work, we present approximate algorithms for the top-k queries and filtering queries on empirical entropy and empirical mutual information. The approximate algorithm allows user-specified tunable parameters to control the trade-off between the query efficiency and accuracy. We design effective stopping rules to return the approximate answers with improved query time. We further present theoretical analysis and show that our proposed solutions achieve improved time complexity over previous solutions. We experimentally evaluate our proposed algorithms on real datasets with up to 31M records and 179 attributes. Our experimental results show that the proposed algorithm consistently outperforms the state of the art in terms of computational efficiency, by an order of magnitude in most cases, while providing the same accurate result.","PeriodicalId":360379,"journal":{"name":"Proceedings of the 2021 International Conference on Management of Data","volume":"13 4-5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116859519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

QuiCK: A Queuing System in CloudKit QuiCK: CloudKit中的排队系统

Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3457567

Kfir Lev-Ari, Yizuo Tian, A. Shraer, C. Douglas, Hao Fu, Andrey Andreev, Kevin Beranek, Scott Dugas, Alec Grieser, Jeremy Hemmo

引用次数: 2

Model-Parallel Model Selection for Deep Learning Systems 深度学习系统的模型-并行模型选择

Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3450571

Kabir Nagrecha

引用次数: 12

Not your Grandpa's SSD: The Era of Co-Designed Storage Devices 不是你爷爷的SSD:共同设计存储设备的时代

Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3457540

Alberto Lerner, Philippe Bonnet

{"title":"Not your Grandpa's SSD: The Era of Co-Designed Storage Devices","authors":"Alberto Lerner, Philippe Bonnet","doi":"10.1145/3448016.3457540","DOIUrl":"https://doi.org/10.1145/3448016.3457540","url":null,"abstract":"Gone is the time when a Solid-State Drive (SSD) was just a fast drop-in replacement for a Hard-Disk Drive (HDD). Thanks to the NVMe ecosystem, nowadays, SSDs are accessed through specific interfaces and modern I/O frameworks. SSDs have also grown versatile with time and can now support various use cases ranging from cold, high-density storage to hot, low-latency ones. The body of knowledge about building such different devices is mostly available, but it is less than accessible to non-experts. Finding which device variation can better support a given workload also requires deep domain knowledge. This tutorial's first goal is to make these tasks--understanding the design of SSDs and pairing them with the data-intensive workloads they support well--more inviting. The tutorial goes further, however, in that it suggests that a new kind of SSD plays an essential role in post-Moore computer systems. These devices can be co-designed to align their capabilities to an application's requirements. A salient feature of these devices is that they can run application logic besides just storing data. They can thus gracefully scale processing capabilities with the volume of data stored. The tutorial's second goal is thus to establish the design space for co-designed SSDs and show the tools available to hardware, systems, and databases researchers that wish to explore this space.","PeriodicalId":360379,"journal":{"name":"Proceedings of the 2021 International Conference on Management of Data","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124563207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Versatile Equivalences: Speeding up Subgraph Query Processing and Subgraph Matching 通用等价:加速子图查询处理和子图匹配

Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3457265

Hyunjoon Kim, Yunyoung Choi, Kunsoo Park, Xuemin Lin, Seok-Hee Hong, Wook-Shin Han

引用次数: 25

PyExplore: Query Recommendations for Data Exploration without Query Logs PyExplore:没有查询日志的数据探索的查询建议

Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3452762

Apostolos Glenis, G. Koutrika

引用次数: 4

Grouped Learning: Group-By Model Selection Workloads 分组学习:分组模型选择工作量

Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3450576

Side Li

引用次数: 0

TardisDB

Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3452767

Maximilian E. Schüle, Josef Schmeißer, T. Blum, Alfons Kemper, Thomas Neumann

引用次数: 7

ResTune: Resource Oriented Tuning Boosted by Meta-Learning for Cloud Databases retune:基于云数据库元学习的面向资源调优

Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3457291

Xinyi Zhang, Hong Wu, Zhuonan Chang, Shuowei Jin, Jian Tan, Feifei Li, Tieying Zhang, Bin Cui

{"title":"ResTune: Resource Oriented Tuning Boosted by Meta-Learning for Cloud Databases","authors":"Xinyi Zhang, Hong Wu, Zhuonan Chang, Shuowei Jin, Jian Tan, Feifei Li, Tieying Zhang, Bin Cui","doi":"10.1145/3448016.3457291","DOIUrl":"https://doi.org/10.1145/3448016.3457291","url":null,"abstract":"Modern database management systems (DBMS) contain tens to hundreds of critical performance tuning knobs that determine the system runtime behaviors. To reduce the total cost of ownership, cloud database providers put in drastic effort to automatically optimize the resource utilization by tuning these knobs. There are two challenges. First, the tuning system should always abide by the service level agreement (SLA) while optimizing the resource utilization, which imposes strict constrains on the tuning process. Second, the tuning time should be reasonably acceptable since time-consuming tuning is not practical for production and online troubleshooting. In this paper, we design ResTune to automatically optimize the resource utilization without violating SLA constraints on the throughput and latency requirements. ResTune leverages the tuning experience from the history tasks and transfers the accumulated knowledge to accelerate the tuning process of the new tasks. The prior knowledge is represented from historical tuning tasks through an ensemble model. The model learns the similarity between the historical workloads and the target, which significantly reduces the tuning time by a meta-learning based approach. ResTune can efficiently handle different workloads and various hardware environments. We perform evaluations using benchmarks and real world workloads on different types of resources. The results show that, compared with the manually tuned configurations, ResTune reduces 65%, 87%, 39% of CPU utilization, I/O and memory on average, respectively. Compared with the state-of-the-art methods, ResTune finds better configurations with up to ~18x speedups.","PeriodicalId":360379,"journal":{"name":"Proceedings of the 2021 International Conference on Management of Data","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130029940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 46

Automatic Optimization of Matrix Implementations for Distributed Machine Learning and Linear Algebra 分布式机器学习和线性代数中矩阵实现的自动优化

Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI: 10.1145/3448016.3457317

Shangyu Luo, Dimitrije Jankov, Binhang Yuan, C. Jermaine

引用次数: 8