2014 IEEE 30th International Conference on Data Engineering最新文献_第4页

Automatic entity-grouping for OLTP workloads OLTP工作负载的自动实体分组

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816694

Bin Liu, J. Tatemura, Oliver Po, Wang-Pin Hsiung, Hakan Hacıgümüş

{"title":"Automatic entity-grouping for OLTP workloads","authors":"Bin Liu, J. Tatemura, Oliver Po, Wang-Pin Hsiung, Hakan Hacıgümüş","doi":"10.1109/ICDE.2014.6816694","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816694","url":null,"abstract":"Supporting an online transaction processing (OLTP) workload in a scalable and elastic fashion is a challenging task. Recently, a new breed of scalable systems have shown significant throughput gains by limiting consistency to small units of data called “entity-groups” (e.g., a user's account information stored together with all her emails in an online email service.) Transactions that access the data from only one entity-group are guaranteed of full ACID, but those that access multiple entity-groups are not. Defining entity-groups has direct impact on workload consistency and performance, and doing so for data with a complex schema is very challenging. It is prone to go to extremes - groups that are too fine-grained cause excessive number of expensive distributed transactions while those that are too coarse lead to excessive serialization and performance degradation. It is also difficult to balance conflicting requirements from different transactions. In commercially available entity-group systems, creating entity-groups is usually a manual process, which severely limits the usability of those systems. This paper is the first systematic effort on automating the entity-group design process. Our goal is to build a user-friendly design tool for automatically creating entity-groups based on a given workload and to help users trade consistency for performance in a principled manner. For advanced users, we allow them to provide feedback to the entity-group design and iteratively improve the final output. We demonstrate the effectiveness of our approach with widely used benchmarks. We also present the user experience of a prototype we built.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125601483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Continuous fragmented skylines over distributed streams 分布流上连续的碎片化天际线

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816645

Odysseas Papapetrou, M. Garofalakis

{"title":"Continuous fragmented skylines over distributed streams","authors":"Odysseas Papapetrou, M. Garofalakis","doi":"10.1109/ICDE.2014.6816645","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816645","url":null,"abstract":"Distributed skyline computation is important for a wide range of application domains, from distributed and web-based systems to ISP-network monitoring and distributed databases. The problem is particularly challenging in dynamic distributed settings, where the goal is to efficiently monitor a continuous skyline query over a collection of distributed streams. All existing work relies on the assumption of a single point of reference for object attributes/dimensions, i.e., objects may be vertically or horizontally partitioned, but the accurate value of each dimension for each object is always maintained by a single site. This assumption is unrealistic for several distributed monitoring applications, where object information is fragmented over a set of distributed streams (each monitored by a different site) and needs to be aggregated (e.g., averaged) across several sites. Furthermore, it is frequently useful to define skyline dimensions through complex functions over the aggregated objects, which raises further challenges for dealing with object fragmentation. In this paper, we present the first known distributed approach for continuous fragmented skylines, namely distributed monitoring of skylines over complex functions of fragmented multi-dimensional objects. We also propose several optimizations, including a new technique based on random-walk models for adaptively determining the most efficient monitoring strategy for each object. A thorough experimental study with synthetic and real-life data sets verifies the effectiveness of our approach, demonstrating order-of-magnitude improvements in communication costs compared to the only available centralized solution.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126805004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Parallel SECONDO: A practical system for large-scale processing of moving objects 并行SECONDO:一种用于大规模处理运动物体的实用系统

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816738

Jiamin Lu, R. H. Güting

引用次数: 18

CaSSanDra: An SSD boosted key-value store CaSSanDra: SSD增强的键值存储

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816732

Prashanth Menon, T. Rabl, Mohammad Sadoghi, H. Jacobsen

引用次数: 24

A demonstration of MNTG - A web-based road network traffic generator 一个基于网络的道路网络交通生成器MNTG的演示

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816752

M. Mokbel, Louai Alarabi, Jie Bao, A. Eldawy, A. Magdy, Mohamed Sarwat, Ethan Waytas, Steven Yackel

引用次数: 14

Continuous data cleaning 连续数据清理

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816655

M. Volkovs, Fei Chiang, Jaroslaw Szlichta, Renée J. Miller

{"title":"Continuous data cleaning","authors":"M. Volkovs, Fei Chiang, Jaroslaw Szlichta, Renée J. Miller","doi":"10.1109/ICDE.2014.6816655","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816655","url":null,"abstract":"In declarative data cleaning, data semantics are encoded as constraints and errors arise when the data violates the constraints. Various forms of statistical and logical inference can be used to reason about and repair inconsistencies (errors) in data. Recently, unified approaches that repair both errors in data and errors in semantics (the constraints) have been proposed. However, both data-only approaches and unified approaches are by and large static in that they apply cleaning to a single snapshot of the data and constraints. We introduce a continuous data cleaning framework that can be applied to dynamic data and constraint environments. Our approach permits both the data and its semantics to evolve and suggests repairs based on the accumulated evidence to date. Importantly, our approach uses not only the data and constraints as evidence, but also considers the past repairs chosen and applied by a user (user repair preferences). We introduce a repair classifier that predicts the type of repair needed to resolve an inconsistency, and that learns from past user repair preferences to recommend more accurate repairs in the future. Our evaluation shows that our techniques achieve high prediction accuracy and generate high quality repairs. Of independent interest, our work makes use of a set of data statistics that are shown to be sensitive to predicting particular repair types.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128665913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 101

Text and structured data fusion in data tamer at scale 大规模数据驯服中的文本和结构化数据融合

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816755

M. Gubanov, M. Stonebraker, D. Bruckner

引用次数: 18

Memory-efficient centroid decomposition for long time series 记忆效率质心分解为长时间序列

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816643

Mourad Khayati, Michael H. Böhlen, J. Gamper

引用次数: 20

KnowLife: A knowledge graph for health and life sciences KnowLife:健康和生命科学的知识图谱

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816754

P. Ernst, Cynthia Meng, A. Siu, G. Weikum

引用次数: 59

Scalable serializable snapshot isolation for multicore systems 多核系统的可伸缩串行快照隔离

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816693

Hyuck Han, Seongjae Park, Hyungsoo Jung, A. Fekete, Uwe Röhm, H. Yeom

引用次数: 13