Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data最新文献

筛选
英文 中文
H2O: a hands-free adaptive store H2O:免提自适应存储器
Ioannis Alagiannis, Stratos Idreos, A. Ailamaki
{"title":"H2O: a hands-free adaptive store","authors":"Ioannis Alagiannis, Stratos Idreos, A. Ailamaki","doi":"10.1145/2588555.2610502","DOIUrl":"https://doi.org/10.1145/2588555.2610502","url":null,"abstract":"Modern state-of-the-art database systems are designed around a single data storage layout. This is a fixed decision that drives the whole architectural design of a database system, i.e., row-stores, column-stores. However, none of those choices is a universally good solution; different workloads require different storage layouts and data access methods in order to achieve good performance. In this paper, we present the H2O system which introduces two novel concepts. First, it is flexible to support multiple storage layouts and data access patterns in a single engine. Second, and most importantly, it decides on-the-fly, i.e., during query processing, which design is best for classes of queries and the respective data parts. At any given point in time, parts of the data might be materialized in various patterns purely depending on the query workload; as the workload changes and with every single query, the storage and access patterns continuously adapt. In this way, H2O makes no a priori and fixed decisions on how data should be stored, allowing each single query to enjoy a storage and access pattern which is tailored to its specific properties. We present a detailed analysis of H2O using both synthetic benchmarks and realistic scientific workloads. We demonstrate that while existing systems cannot achieve maximum performance across all workloads, H2O can always match the best case performance without requiring any tuning or workload knowledge.","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"801 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114999222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 148
DoomDB: kill the query DoomDB:终止查询
Carsten Binnig, Abdallah Salama, Erfan Zamanian
{"title":"DoomDB: kill the query","authors":"Carsten Binnig, Abdallah Salama, Erfan Zamanian","doi":"10.1145/2588555.2594525","DOIUrl":"https://doi.org/10.1145/2588555.2594525","url":null,"abstract":"Typically, fault-tolerance in parallel database systems is handled by restarting a query completely when a node failure happens. However, when deploying a parallel database on a cluster of commodity machines or on IaaS offerings such as Amazon's Spot Instances, node failures are a common case. This requires a more fine-granular fault-tolerance scheme. Therefore, most recent parallel data management platforms such as Hadoop or Shark use a fine-grained fault-tolerance scheme, which materializes all intermediate results in order to be able to recover from mid-query faults. While such a fine-grained fault-tolerance scheme is able to efficiently handle node failures for complex and long-running queries, it is not optimal for short-running latency-sensitive queries since the additional costs for materialization often outweigh the costs for actually executing the query. In this demo, we showcase our novel cost-based fault-tolerance scheme in XDB. It selects which intermediate results to materialize such that the overall query runtime is minimized in the presence of node failures. For the demonstration, we present a computer game called DoomDB. DoomDB is designed as an ego-shooter game with the goal of killing nodes in an XDB database cluster and thus prevent a given query to produce its final result in a given time frame. One interesting use-case of DoomDB is to use it for crowdsourcing the testing activities of XDB.","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115703284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Cloud-based RDF data management 基于云的RDF数据管理
Zoi Kaoudi, I. Manolescu
{"title":"Cloud-based RDF data management","authors":"Zoi Kaoudi, I. Manolescu","doi":"10.1145/2588555.2588891","DOIUrl":"https://doi.org/10.1145/2588555.2588891","url":null,"abstract":"The W3C's Resource Description Framework (or RDF, in short) is set to deliver many of the original semi-structured data promises: flexible structure, optional schema, and rich, flexible URIs as a basis for information sharing. Moreover, RDF is uniquely positioned to benefit from the efforts of scientific communities studying databases, knowledge representation, and Web technologies. As a consequence, numerous collections of RDF data are published, going from scientific data to general-purpose ontologies to open government data, in particular published as part of the Linked Data movement. Managing such large volumes of RDF data is challenging, due to the sheer size, the heterogeneity, and the further complexity brought by RDF reasoning. To tackle the size challenge, distributed storage architectures are required. Cloud computing is an emerging distributed paradigm massively adopted in many applications for the scalability, fault-tolerance and elasticity features it provides. This tutorial presents the challenges faced in order to efficiently handle massive amounts of RDF data in a cloud environment. We provide the necessary background, analyze and classify existing solutions, and discuss open problems and perspectives.","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125415908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Session details: Research session 4: streams and complex event processing 会议详情:研究会议4:流和复杂事件处理
B. Chandramouli
{"title":"Session details: Research session 4: streams and complex event processing","authors":"B. Chandramouli","doi":"10.1145/3255751","DOIUrl":"https://doi.org/10.1145/3255751","url":null,"abstract":"","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126714372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient cohesive subgraphs detection in parallel 并行高效内聚子图检测
Yingxia Shao, Lei Chen, B. Cui
{"title":"Efficient cohesive subgraphs detection in parallel","authors":"Yingxia Shao, Lei Chen, B. Cui","doi":"10.1145/2588555.2593665","DOIUrl":"https://doi.org/10.1145/2588555.2593665","url":null,"abstract":"A cohesive subgraph is a primary vehicle for massive graph analysis, and a newly introduced cohesive subgraph, k-truss, which is motivated by a natural observation of social cohesion, has attracted more and more attention. However, the existing parallel solutions to identify the k-truss are inefficient for very large graphs, as they still suffer from huge communication cost and large number of iterations during the computation. In this paper, we propose a novel parallel and efficient truss detection algorithm, called PeTa. The PeTa produces a triangle complete subgraph (TC-subgraph) for every computing node. Based on the TC-subgraphs, PeTa can detect the local k-truss in parallel within a few iterations. We theoretically prove, within this new paradigm, the communication cost of PeTa is bounded by three times of the number of triangles, the total computation complexity of PeTa is the same order as the best known serial algorithm and the number of iterations for a given partition scheme is minimized as well. Furthermore, we present a subgraph-oriented model to efficiently express PeTa in parallel graph computing systems. The results of comprehensive experiments demonstrate, compared with the existing solutions, PeTa saves 2X to 19X in communication cost, reduces 80% to 95% number of iterations and improves the overall performance by 80% across various real-world graphs.","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126862418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Interactive redescription mining 交互式重描述挖掘
E. Galbrun, Pauli Miettinen
{"title":"Interactive redescription mining","authors":"E. Galbrun, Pauli Miettinen","doi":"10.1145/2588555.2594520","DOIUrl":"https://doi.org/10.1145/2588555.2594520","url":null,"abstract":"Exploratory data analysis consists of multiple iterated steps: a data mining method is run on the data, the results are interpreted, new insights are formed, and the resulting knowledge is utilized when executing the method in a next round, and so on until satisfactory results are obtained. We focus on redescription mining, a powerful data analysis method that aims at finding alternative descriptions of the same entities, for example, ways to characterize geographical regions in terms of both the fauna that inhabits them and their bioclimatic conditions, so-called bioclimatic niches. We present Siren, a tool for interactive redescription mining. It is designed to facilitate the exploratory analysis of data by providing a seamless environment for mining, visualizing and editing redescriptions in an interactive fashion, supporting the analysis process in all its stages. We demonstrate its use for exploratory data mining. Simultaneously, Siren exemplifies the power of the various visualizations and means of interaction integrated into it; Techniques that reach beyond the task of redescription mining considered here, to other analysis methods.","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116234256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
MISO: souping up big data query processing with a multistore system MISO:用多存储系统加速大数据查询处理
J. LeFevre, Jagan Sankaranarayanan, Hakan Hacıgümüş, J. Tatemura, N. Polyzotis, M. Carey
{"title":"MISO: souping up big data query processing with a multistore system","authors":"J. LeFevre, Jagan Sankaranarayanan, Hakan Hacıgümüş, J. Tatemura, N. Polyzotis, M. Carey","doi":"10.1145/2588555.2588568","DOIUrl":"https://doi.org/10.1145/2588555.2588568","url":null,"abstract":"Multistore systems utilize multiple distinct data stores such as Hadoop's HDFS and an RDBMS for query processing by allowing a query to access data and computation in both stores. Current approaches to multistore query processing fail to achieve the full potential benefits of utilizing both systems due to the high cost of data movement and loading between the stores. Tuning the physical design of a multistore, i.e., deciding what data resides in which store, can reduce the amount of data movement during query processing, which is crucial for good multistore performance. In this work, we provide what we believe to be the first method to tune the physical design of a multistore system, by focusing on which store to place data. Our method, called MISO for MultISstore Online tuning, is adaptive, lightweight, and works in an online fashion utilizing only the by-products of query processing, which we term as opportunistic views. We show that MISO significantly improves the performance of ad-hoc big data query processing by leveraging the specific characteristics of the individual stores while incurring little additional overhead on the stores.","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122704638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 124
Session details: Research session 17: graph analytics 研究部分17:图形分析
Wook-Shin Han
{"title":"Session details: Research session 17: graph analytics","authors":"Wook-Shin Han","doi":"10.1145/3255767","DOIUrl":"https://doi.org/10.1145/3255767","url":null,"abstract":"","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114278750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sinew: a SQL system for multi-structured data 一个用于多结构化数据的SQL系统
Daniel Tahara, Thaddeus Diamond, D. Abadi
{"title":"Sinew: a SQL system for multi-structured data","authors":"Daniel Tahara, Thaddeus Diamond, D. Abadi","doi":"10.1145/2588555.2612183","DOIUrl":"https://doi.org/10.1145/2588555.2612183","url":null,"abstract":"As applications are becoming increasingly dynamic, the notion that a schema can be created in advance for an application and remain relatively stable is becoming increasingly unrealistic. This has pushed application developers away from traditional relational database systems and away from the SQL interface, despite their many well-established benefits. Instead, developers often prefer self-describing data models such as JSON, and NoSQL systems designed specifically for their relaxed semantics. In this paper, we discuss the design of a system that enables developers to continue to represent their data using self-describing formats without moving away from SQL and traditional relational database systems. Our system stores arbitrary documents of key-value pairs inside physical and virtual columns of a traditional relational database system, and adds a layer above the database system that automatically provides a dynamic relational view to the user against which fully standard SQL queries can be issued. We demonstrate that our design can achieve an order of magnitude improvement in performance over alternative solutions, including existing relational database JSON extensions, MongoDB, and shredding systems that store flattened key-value data inside a relational database.","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121914560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 70
Exploiting ordered dictionaries to efficiently construct histograms with q-error guarantees in SAP HANA 利用有序字典在SAP HANA中高效地构建具有q-error保证的直方图
G. Moerkotte, David DeHaan, Norman May, A. Nica, Alexander Böhm
{"title":"Exploiting ordered dictionaries to efficiently construct histograms with q-error guarantees in SAP HANA","authors":"G. Moerkotte, David DeHaan, Norman May, A. Nica, Alexander Böhm","doi":"10.1145/2588555.2595629","DOIUrl":"https://doi.org/10.1145/2588555.2595629","url":null,"abstract":"Histograms that guarantee a maximum multiplicative error (q-error) for estimates may significantly improve the plan quality of query optimizers. However, the construction time for histograms with maximum q-error was too high for practical use cases. In this paper we extend this concept with a threshold, i.e., an estimate or true cardinality θ, below which we do not care about the q-error because we still expect optimal plans. This allows us to develop far more efficient construction algorithms for histograms with bounded error. The test for θ, q-acceptability developed also exploits the order-preserving dictionary encoding of SAP HANA. We have integrated this family of histograms into SAP HANA, and we report on the construction time, histograms size, and estimation errors on real-world data sets. In virtually all cases the histograms can be constructed in far less than one second, requiring less than 5% of space compared to the original compressed data.","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128378258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信