A. Ungethüm, T. Kissinger, Willi-Wolfram Mentzel, Eric Mier, Dirk Habich, Wolfgang Lehner
{"title":"Energy Elasticity on Heterogeneous Hardware using Adaptive Resource Reconfiguration LIVE","authors":"A. Ungethüm, T. Kissinger, Willi-Wolfram Mentzel, Eric Mier, Dirk Habich, Wolfgang Lehner","doi":"10.1145/2882903.2899390","DOIUrl":"https://doi.org/10.1145/2882903.2899390","url":null,"abstract":"Energy awareness of database systems has emerged as a critical research topic, since energy consumption is becoming a major limiter for their scalability. Recent energy-related hardware developments trend towards offering more and more configuration opportunities for the software to control its own energy consumption. Existing research so far mainly focused on leveraging this configuration spectrum to find the most energy-efficient configuration for specific operators or entire queries. In this demo, we introduce the concept of energy elasticity and propose the energy-control loop as an implementation of this concept. Energy elasticity refers to the ability of software to behave energy-proportional and energy-efficient at the same time while maintaining a certain quality of service. Thus, our system does not draw the least energy possible but the least energy necessary to still perform reasonably. We demonstrate our overall approach using a rich interactive GUI to give attendees the opportunity to learn more about our concept.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75699071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Agrawal, M. Ba, Laure Berti-Équille, S. Chawla, A. Elmagarmid, Hossam M. Hammady, Yasser Idris, Zoi Kaoudi, Zuhair Khayyat, Sebastian Kruse, M. Ouzzani, Paolo Papotti, Jorge-Arnulfo Quiané-Ruiz, N. Tang, Mohammed J. Zaki
{"title":"Rheem: Enabling Multi-Platform Task Execution","authors":"D. Agrawal, M. Ba, Laure Berti-Équille, S. Chawla, A. Elmagarmid, Hossam M. Hammady, Yasser Idris, Zoi Kaoudi, Zuhair Khayyat, Sebastian Kruse, M. Ouzzani, Paolo Papotti, Jorge-Arnulfo Quiané-Ruiz, N. Tang, Mohammed J. Zaki","doi":"10.1145/2882903.2899414","DOIUrl":"https://doi.org/10.1145/2882903.2899414","url":null,"abstract":"Many emerging applications, from domains such as healthcare and oil & gas, require several data processing systems for complex analytics. This demo paper showcases system, a framework that provides multi-platform task execution for such applications. It features a three-layer data processing abstraction and a new query optimization approach for multi-platform settings. We will demonstrate the strengths of system by using real-world scenarios from three different applications, namely, machine learning, data cleaning, and data fusion.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72958395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring Visualization of Data Transforms","authors":"Larry Xu","doi":"10.1145/2882903.2914837","DOIUrl":"https://doi.org/10.1145/2882903.2914837","url":null,"abstract":"In the context of data exploration, users often interact with relational database systems in an interactive query session to form useful insights. Each query a user executes can potentially transform a resultset in complex ways. We explore some of the challenges in understanding these transformations, and how these challenges can be solved through more informative visual representations of data transforms. We present the concept of \"tweening\" of resultsets as a method of incrementally visualizing data transformations, and explore approaches towards generating these resultset tweens. Through a series of user studies, we evaluate tweening as an effective method of understanding the changes that result from data transformations.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81945157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kangnyeon Kim, Tianzheng Wang, Ryan Johnson, I. Pandis
{"title":"ERMIA: Fast Memory-Optimized Database System for Heterogeneous Workloads","authors":"Kangnyeon Kim, Tianzheng Wang, Ryan Johnson, I. Pandis","doi":"10.1145/2882903.2882905","DOIUrl":"https://doi.org/10.1145/2882903.2882905","url":null,"abstract":"Large main memories and massively parallel processors have triggered not only a resurgence of high-performance transaction processing systems optimized for large main-memory and massively parallel processors, but also an increasing demand for processing heterogeneous workloads that include read-mostly transactions. Many modern transaction processing systems adopt a lightweight optimistic concurrency control (OCC) scheme to leverage its low overhead in low contention workloads. However, we observe that the lightweight OCC is not suitable for heterogeneous workloads, causing significant starvation of read-mostly transactions and overall performance degradation. In this paper, we present ERMIA, a memory-optimized database system built from scratch to cater the need of handling heterogeneous workloads. ERMIA adopts snapshot isolation concurrency control to coordinate heterogeneous transactions and provides serializability when desired. Its physical layer supports the concurrency control schemes in a scalable way. Experimental results show that ERMIA delivers comparable or superior performance and near-linear scalability in a variety of workloads, compared to a recent lightweight OCC-based system. At the same time, ERMIA maintains high throughput on read-mostly transactions when the performance of the OCC-based system drops by orders of magnitude.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80818181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mina H. Farid, Alexandra Roatis, I. Ilyas, H. Hoffmann, Xu Chu
{"title":"CLAMS: Bringing Quality to Data Lakes","authors":"Mina H. Farid, Alexandra Roatis, I. Ilyas, H. Hoffmann, Xu Chu","doi":"10.1145/2882903.2899391","DOIUrl":"https://doi.org/10.1145/2882903.2899391","url":null,"abstract":"With the increasing incentive of enterprises to ingest as much data as they can in what is commonly referred to as \"data lakes\", and with the recent development of multiple technologies to support this \"load-first\" paradigm, the new environment presents serious data management challenges. Among them, the assessment of data quality and cleaning large volumes of heterogeneous data sources become essential tasks in unveiling the value of big data. The coveted use of unstructured and semi-structured data in large volumes makes current data cleaning tools (primarily designed for relational data) not directly adoptable. We present CLAMS, a system to discover and enforce expressive integrity constraints from large amounts of lake data with very limited schema information (e.g., represented as RDF triples). This demonstration shows how CLAMS is able to discover the constraints and the schemas they are defined on simultaneously. CLAMS also introduces a scale-out solution to efficiently detect errors in the raw data. CLAMS interacts with human experts to both validate the discovered constraints and to suggest data repairs. CLAMS has been deployed in a real large-scale enterprise data lake and was experimented with a real data set of 1.2 billion triples. It has been able to spot multiple obscure data inconsistencies and errors early in the data processing stack, providing huge value to the enterprise.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80406317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Functional Dependencies for Graphs","authors":"W. Fan, Yinghui Wu, Jingbo Xu","doi":"10.1145/2882903.2915232","DOIUrl":"https://doi.org/10.1145/2882903.2915232","url":null,"abstract":"We propose a class of functional dependencies for graphs, referred to as GFDs. GFDs capture both attribute-value dependencies and topological structures of entities, and subsume conditional functional dependencies (CFDs) as a special case. We show that the satisfiability and implication problems for GFDs are coNP-complete and NP-complete, respectively, no worse than their CFD counterparts. We also show that the validation problem for GFDs is coNP-complete. Despite the intractability, we develop parallel scalable algorithms for catching violations of GFDs in large-scale graphs. Using real-life and synthetic data, we experimentally verify that GFDs provide an effective approach to detecting inconsistencies in knowledge and social graphs.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83388592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gabriel Lyons, Vinh Q. Tran, Carsten Binnig, U. Çetintemel, Tim Kraska
{"title":"Making the Case for Query-by-Voice with EchoQuery","authors":"Gabriel Lyons, Vinh Q. Tran, Carsten Binnig, U. Çetintemel, Tim Kraska","doi":"10.1145/2882903.2899394","DOIUrl":"https://doi.org/10.1145/2882903.2899394","url":null,"abstract":"Recent advances in automatic speech recognition and natural language processing have led to a new generation of robust voice-based interfaces. Yet, there is very little work on using voice-based interfaces to query database systems. In fact, one might even wonder who in her right mind would want to query a database system using voice commands! With this demonstration, we make the case for querying database systems using a voice-based interface, a new querying and interaction paradigm we call Query-by-Voice (QbV). We will demonstrate the practicality and utility of QbV for relational DBMSs using a using a proof-of-concept system called EchoQuery. To achieve a smooth and intuitive interaction, the query interface of EchoQuery is inspired by casual human-to-human conversations. Our demo will show that voice-based interfaces present an intuitive means of querying and consuming data in a database. It will also highlight the unique advantages of QbV over the more traditional approaches, text-based or visual interfaces, for applications where context switching is too expensive, too risky or even not possible at all.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81471810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Kolev, Carlyna Bondiombouy, P. Valduriez, R. Jiménez-Peris, Raquel Pau, José Pereira
{"title":"The CloudMdsQL Multistore System","authors":"B. Kolev, Carlyna Bondiombouy, P. Valduriez, R. Jiménez-Peris, Raquel Pau, José Pereira","doi":"10.1145/2882903.2899400","DOIUrl":"https://doi.org/10.1145/2882903.2899400","url":null,"abstract":"The blooming of different cloud data management infrastructures has turned multistore systems to a major topic in the nowadays cloud landscape. In this demonstration, we present a Cloud Multidatastore Query Language (CloudMdsQL), and its query engine. CloudMdsQL is a functional SQL-like language, capable of querying multiple heterogeneous data stores (relational and NoSQL) within a single query that may contain embedded invocations to each data store's native query interface. The major innovation is that a CloudMdsQL query can exploit the full power of local data stores, by simply allowing some local data store native queries (e.g. a breadth-first search query against a graph database) to be called as functions, and at the same time be optimized. Within our demonstration, we focus on two use cases each involving four diverse data stores (graph, document, relational, and key-value) with its corresponding CloudMdsQL queries. The query execution flows are visualized by an embedded real-time monitoring subsystem. The users can also try out different ad-hoc queries, not necessarily in the context of the use cases.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78747902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Krishnan, M. Franklin, Ken Goldberg, Jiannan Wang, Eugene Wu
{"title":"ActiveClean: An Interactive Data Cleaning Framework For Modern Machine Learning","authors":"S. Krishnan, M. Franklin, Ken Goldberg, Jiannan Wang, Eugene Wu","doi":"10.1145/2882903.2899409","DOIUrl":"https://doi.org/10.1145/2882903.2899409","url":null,"abstract":"Databases can be corrupted with various errors such as missing, incorrect, or inconsistent values. Increasingly, modern data analysis pipelines involve Machine Learning, and the effects of dirty data can be difficult to debug.Dirty data is often sparse, and naive sampling solutions are not suited for high-dimensional models. We propose ActiveClean, a progressive framework for training Machine Learning models with data cleaning. Our framework updates a model iteratively as the analyst cleans small batches of data, and includes numerous optimizations such as importance weighting and dirty data detection. We designed a visual interface to wrap around this framework and demonstrate ActiveClean for a video classification problem and a topic modeling problem.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83889121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shan-Hung Wu, Tsai-Yu Feng, Meng-Kai Liao, Shao-Kan Pi, Yu-Shan Lin
{"title":"T-Part: Partitioning of Transactions for Forward-Pushing in Deterministic Database Systems","authors":"Shan-Hung Wu, Tsai-Yu Feng, Meng-Kai Liao, Shao-Kan Pi, Yu-Shan Lin","doi":"10.1145/2882903.2915227","DOIUrl":"https://doi.org/10.1145/2882903.2915227","url":null,"abstract":"Deterministic database systems have been shown to yield high throughput on a cluster of commodity machines while ensuring the strong consistency between replicas, provided that the data can be well-partitioned on these machines. However, data partitioning can be suboptimal for many reasons in real-world applications. In this paper, we present T-Part, a transaction execution engine that partitions transactions in a deterministic database system to deal with the unforeseeable workloads or workloads whose data are hard to partition. By modeling the dependency between transactions as a T-graph and continuously partitioning that graph, T-Part allows each transaction to know which later transactions on other machines will read its writes so that it can push forward the writes to those later transactions immediately after committing. This forward-pushing reduces the chance that the later transactions stall due to the unavailability of remote data. We implement a prototype for T-Part. Extensive experiments are conducted and the results demonstrate the effectiveness of T-Part.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86126791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}