Proceedings of the 2018 International Conference on Management of Data最新文献

筛选
英文 中文
Demonstration of Smoke: A Deep Breath of Data-Intensive Lineage Applications 烟雾演示:深吸一口气的数据密集型谱系应用
Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3193537
Fotis Psallidas, Eugene Wu
{"title":"Demonstration of Smoke: A Deep Breath of Data-Intensive Lineage Applications","authors":"Fotis Psallidas, Eugene Wu","doi":"10.1145/3183713.3193537","DOIUrl":"https://doi.org/10.1145/3183713.3193537","url":null,"abstract":"Data lineage is a fundamental type of information that describes the relationships between input and output data items in a workflow. As such, an immense amount of data-intensive applications with logic over the input-output relationships can be expressed declaratively in lineage terms. Unfortunately, many applications resort to hand-tuned implementations because either lineage systems are not fast enough to meet their requirements or due to no knowledge of the lineage capabilities. Recently, we introduced a set of implementation design principles and associated techniques to optimize lineage-enabled database engines and realized them in our prototype database engine, namely, Smoke. In this demonstration, we showcase lineage as the building block across a variety of data-intensive applications, including tooltips and details on demand; crossfilter; and data profiling. In addition, we show how Smoke outperforms alternative lineage systems to meet or improve on existing hand-tuned implementations of these applications.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88042857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models 数据计算器:从第一原则和学习成本模型出发的数据结构设计和成本综合
Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3199671
Stratos Idreos, Konstantinos Zoumpatianos, Brian Hentschel, Michael S. Kester, Demi Guo
{"title":"The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models","authors":"Stratos Idreos, Konstantinos Zoumpatianos, Brian Hentschel, Michael S. Kester, Demi Guo","doi":"10.1145/3183713.3199671","DOIUrl":"https://doi.org/10.1145/3183713.3199671","url":null,"abstract":"Data structures are critical in any data-driven scenario, but they are notoriously hard to design due to a massive design space and the dependence of performance on workload and hardware which evolve continuously. We present a design engine, the Data Calculator, which enables interactive and semi-automated design of data structures. It brings two innovations. First, it offers a set of fine-grained design primitives that capture the first principles of data layout design: how data structure nodes lay data out, and how they are positioned relative to each other. This allows for a structured description of the universe of possible data structure designs that can be synthesized as combinations of those primitives. The second innovation is computation of performance using learned cost models. These models are trained on diverse hardware and data profiles and capture the cost properties of fundamental data access primitives (e.g., random access). With these models, we synthesize the performance cost of complex operations on arbitrary data structure designs without having to: 1) implement the data structure, 2) run the workload, or even 3) access the target hardware. We demonstrate that the Data Calculator can assist data structure designers and researchers by accurately answering rich what-if design questions on the order of a few seconds or minutes, i.e., computing how the performance (response time) of a given data structure design is impacted by variations in the: 1) design, 2) hardware, 3) data, and 4) query workloads. This makes it effortless to test numerous designs and ideas before embarking on lengthy implementation, deployment, and hardware acquisition steps. We also demonstrate that the Data Calculator can synthesize entirely new designs, auto-complete partial designs, and detect suboptimal design choices.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88357907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 79
Splaying Log-Structured Merge-Trees 显示日志结构的合并树
Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183723
Thomas Lively, Luca Schroeder, Carlos Mendizábal
{"title":"Splaying Log-Structured Merge-Trees","authors":"Thomas Lively, Luca Schroeder, Carlos Mendizábal","doi":"10.1145/3183713.3183723","DOIUrl":"https://doi.org/10.1145/3183713.3183723","url":null,"abstract":"Modern persistent key-value stores typically use a log-structured merge-tree (LSM-tree) design, which allows for high write throughput. Our observation is that the LSM-tree, however, has suboptimal performance during read-intensive workload windows with non-uniform key access distributions. To address this shortcoming, we propose and analyze a simple decision scheme that can be added to any LSM-based key-value store and dramatically reduce the number of disk I/Os for these classes of workloads. The key insight is that copying a frequently accessed key to the top of an LSM-tree (\"splaying'') allows cheaper reads on that key in the near future.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"71 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85006710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Session details: Industry 3: DB Systems in the Cloud and Open Source 会议详情:行业3:云和开源中的数据库系统
Mohammad Sadoghi
{"title":"Session details: Industry 3: DB Systems in the Cloud and Open Source","authors":"Mohammad Sadoghi","doi":"10.1145/3258015","DOIUrl":"https://doi.org/10.1145/3258015","url":null,"abstract":"","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83195788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient k-Regret Query Algorithm with Restriction-free Bound for any Dimensionality 任意维无约束约束的k-遗憾查询算法
Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196903
Min Xie, R. C. Wong, J. Li, Cheng Long, Ashwin Lall
{"title":"Efficient k-Regret Query Algorithm with Restriction-free Bound for any Dimensionality","authors":"Min Xie, R. C. Wong, J. Li, Cheng Long, Ashwin Lall","doi":"10.1145/3183713.3196903","DOIUrl":"https://doi.org/10.1145/3183713.3196903","url":null,"abstract":"Extracting interesting tuples from a large database is an important problem in multi-criteria decision making. Two representative queries were proposed in the literature: top- k queries and skyline queries. A top- k query requires users to specify their utility functions beforehand and then returns k tuples to the users. A skyline query does not require any utility function from users but it puts no control on the number of tuples returned to users. Recently, a k-regret query was proposed and received attention from the community because it does not require any utility function from users and the output size is controllable, and thus it avoids those deficiencies of top- k queries and skyline queries. Specifically, it returns k tuples that minimize a criterion called the maximum regret ratio . In this paper, we present the lower bound of the maximum regret ratio for the k -regret query. Besides, we propose a novel algorithm, called SPHERE, whose upper bound on the maximum regret ratio is asymptotically optimal and restriction-free for any dimensionality, the best-known result in the literature. We conducted extensive experiments to show that SPHERE performs better than the state-of-the-art methods for the k -regret query.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78730928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
DITA: Distributed In-Memory Trajectory Analytics DITA:分布式内存轨迹分析
Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183743
Zeyuan Shang, Guoliang Li, Z. Bao
{"title":"DITA: Distributed In-Memory Trajectory Analytics","authors":"Zeyuan Shang, Guoliang Li, Z. Bao","doi":"10.1145/3183713.3183743","DOIUrl":"https://doi.org/10.1145/3183713.3183743","url":null,"abstract":"Trajectory analytics can benefit many real-world applications, e.g., frequent trajectory based navigation systems, road planning, car pooling, and transportation optimizations. Existing algorithms focus on optimizing this problem in a single machine. However, the amount of trajectories exceeds the storage and processing capability of a single machine, and it calls for large-scale trajectory analytics in distributed environments. The distributed trajectory analytics faces challenges of data locality aware partitioning, load balance, easy-to-use interface, and versatility to support various trajectory similarity functions. To address these challenges, we propose a distributed in-memory trajectory analytics system DITA. We propose an effective partitioning method, global index and local index, to address the data locality problem. We devise cost-based techniques to balance the workload. We develop a filter-verification framework to improve the performance. Moreover, DITA can support most of existing similarity functions to quantify the similarity between trajectories. We integrate our framework seamlessly into Spark SQL, and make it support SQL and DataFrame API interfaces. We have conducted extensive experiments on real world datasets, and experimental results show that DITA outperforms existing distributed trajectory similarity search and join approaches significantly.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84202353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 87
Managing Non-Volatile Memory in Database Systems 管理数据库系统中的非易失性内存
Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196897
Alexander van Renen, Viktor Leis, A. Kemper, Thomas Neumann, T. Hashida, Kazuichi Oe, Y. Doi, L. Harada, Mitsuru Sato
{"title":"Managing Non-Volatile Memory in Database Systems","authors":"Alexander van Renen, Viktor Leis, A. Kemper, Thomas Neumann, T. Hashida, Kazuichi Oe, Y. Doi, L. Harada, Mitsuru Sato","doi":"10.1145/3183713.3196897","DOIUrl":"https://doi.org/10.1145/3183713.3196897","url":null,"abstract":"Non-volatile memory (NVM) is a new storage technology that combines the performance and byte addressability of DRAM with the persistence of traditional storage devices like flash (SSD). While these properties make NVM highly promising, it is not yet clear how to best integrate NVM into the storage layer of modern database systems. Two system designs have been proposed. The first is to use NVM exclusively, i.e., to store all data and index structures on it. However, because NVM has a higher latency than DRAM, this design can be less efficient than main-memory database systems. For this reason, the second approach uses a page-based DRAM cache in front of NVM. This approach, however, does not utilize the byte addressability of NVM and, as a result, accessing an uncached tuple on NVM requires retrieving an entire page. In this work, we evaluate these two approaches and compare them with in-memory databases as well as more traditional buffer managers that use main memory as a cache in front of SSDs. This allows us to determine how much performance gain can be expected from NVM. We also propose a lightweight storage manager that simultaneously supports DRAM, NVM, and flash. Our design utilizes the byte addressability of NVM and uses it as an additional caching layer that improves performance without losing the benefits from the even faster DRAM and the large capacities of SSDs.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88098087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 98
IMPROVE-QA: An Interactive Mechanism for RDF Question/Answering Systems 改进- qa: RDF问答系统的交互机制
Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3193555
Xinbo Zhang, Lei Zou
{"title":"IMPROVE-QA: An Interactive Mechanism for RDF Question/Answering Systems","authors":"Xinbo Zhang, Lei Zou","doi":"10.1145/3183713.3193555","DOIUrl":"https://doi.org/10.1145/3183713.3193555","url":null,"abstract":"RDF Question/Answering(Q/A) systems can interpret user's question N as SPARQL query Q and return answer set $Q(D)$ over RDF repository D to the user. However, due to the complexity of linking natural phrases with specific RDF items (e.g., entities and predicates), it remains difficult to understand users' questions precisely, hence $Q(D)$ may not meet users' expectation, offering wrong answers and dismissing some correct answers. In this demo, we design an I Interactive Mechanism aiming for PRO motion V ia feedback to Q/A systems (IMPROVE-QA), a whole platform to make existing Q/A systems return more precise answers (denoted as $mathcal Q^prime (D)$) to users. Based on user's feedback over $Q(D)$, IMPROVE-QA automatically refines the original query Q into a new query graph $mathcal Q^prime $ with minimum modifications, where $mathcal Q^prime (D)$ provides more precise answers. We will also demonstrate how IMPROVE-QA can apply the \"lesson'' learned from the user in each query to improve the precision of Q/A systems on subsequent natural language questions.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90717749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Session details: Research 11: Data Mining 研究11:数据挖掘
L. Lakshmanan
{"title":"Session details: Research 11: Data Mining","authors":"L. Lakshmanan","doi":"10.1145/3258018","DOIUrl":"https://doi.org/10.1145/3258018","url":null,"abstract":"","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"70 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90504153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session details: Research 15: Databases for Emerging Hardware 研究15:面向新兴硬件的数据库
P. Pietzuch
{"title":"Session details: Research 15: Databases for Emerging Hardware","authors":"P. Pietzuch","doi":"10.1145/3258023","DOIUrl":"https://doi.org/10.1145/3258023","url":null,"abstract":"","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"57 6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85435036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信