2014 IEEE 30th International Conference on Data Engineering最新文献

筛选
英文 中文
Automatic entity-grouping for OLTP workloads OLTP工作负载的自动实体分组
2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816694
Bin Liu, J. Tatemura, Oliver Po, Wang-Pin Hsiung, Hakan Hacıgümüş
{"title":"Automatic entity-grouping for OLTP workloads","authors":"Bin Liu, J. Tatemura, Oliver Po, Wang-Pin Hsiung, Hakan Hacıgümüş","doi":"10.1109/ICDE.2014.6816694","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816694","url":null,"abstract":"Supporting an online transaction processing (OLTP) workload in a scalable and elastic fashion is a challenging task. Recently, a new breed of scalable systems have shown significant throughput gains by limiting consistency to small units of data called “entity-groups” (e.g., a user's account information stored together with all her emails in an online email service.) Transactions that access the data from only one entity-group are guaranteed of full ACID, but those that access multiple entity-groups are not. Defining entity-groups has direct impact on workload consistency and performance, and doing so for data with a complex schema is very challenging. It is prone to go to extremes - groups that are too fine-grained cause excessive number of expensive distributed transactions while those that are too coarse lead to excessive serialization and performance degradation. It is also difficult to balance conflicting requirements from different transactions. In commercially available entity-group systems, creating entity-groups is usually a manual process, which severely limits the usability of those systems. This paper is the first systematic effort on automating the entity-group design process. Our goal is to build a user-friendly design tool for automatically creating entity-groups based on a given workload and to help users trade consistency for performance in a principled manner. For advanced users, we allow them to provide feedback to the entity-group design and iteratively improve the final output. We demonstrate the effectiveness of our approach with widely used benchmarks. We also present the user experience of a prototype we built.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125601483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Continuous fragmented skylines over distributed streams 分布流上连续的碎片化天际线
2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816645
Odysseas Papapetrou, M. Garofalakis
{"title":"Continuous fragmented skylines over distributed streams","authors":"Odysseas Papapetrou, M. Garofalakis","doi":"10.1109/ICDE.2014.6816645","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816645","url":null,"abstract":"Distributed skyline computation is important for a wide range of application domains, from distributed and web-based systems to ISP-network monitoring and distributed databases. The problem is particularly challenging in dynamic distributed settings, where the goal is to efficiently monitor a continuous skyline query over a collection of distributed streams. All existing work relies on the assumption of a single point of reference for object attributes/dimensions, i.e., objects may be vertically or horizontally partitioned, but the accurate value of each dimension for each object is always maintained by a single site. This assumption is unrealistic for several distributed monitoring applications, where object information is fragmented over a set of distributed streams (each monitored by a different site) and needs to be aggregated (e.g., averaged) across several sites. Furthermore, it is frequently useful to define skyline dimensions through complex functions over the aggregated objects, which raises further challenges for dealing with object fragmentation. In this paper, we present the first known distributed approach for continuous fragmented skylines, namely distributed monitoring of skylines over complex functions of fragmented multi-dimensional objects. We also propose several optimizations, including a new technique based on random-walk models for adaptively determining the most efficient monitoring strategy for each object. A thorough experimental study with synthetic and real-life data sets verifies the effectiveness of our approach, demonstrating order-of-magnitude improvements in communication costs compared to the only available centralized solution.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126805004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Parallel SECONDO: A practical system for large-scale processing of moving objects 并行SECONDO:一种用于大规模处理运动物体的实用系统
2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816738
Jiamin Lu, R. H. Güting
{"title":"Parallel SECONDO: A practical system for large-scale processing of moving objects","authors":"Jiamin Lu, R. H. Güting","doi":"10.1109/ICDE.2014.6816738","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816738","url":null,"abstract":"Parallel Secondo scales up the capability of processing extensible data models in Secondo. It combines Hadoop with a set of Secondo databases, providing almost all existing SECONDO data types and operators. Therefore it is possible for the user to convert large-scale sequential queries to parallel queries without learning the Map/Reduce programming details. This paper demonstrates such a procedure. It imports the data from the project OpenStreetMap into Secondo databases to build up the urban traffic network and then processes network-based queries like map-matching and symbolic trajectory pattern matching. All involved queries were stated as sequential expressions and time-consuming in single-computer Secondo. However, they can achieve an impressive performance in Parallel Secondo after being converted to the corresponding parallel queries, even on a small cluster consisting of six low-end computers.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126727117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
CaSSanDra: An SSD boosted key-value store CaSSanDra: SSD增强的键值存储
2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816732
Prashanth Menon, T. Rabl, Mohammad Sadoghi, H. Jacobsen
{"title":"CaSSanDra: An SSD boosted key-value store","authors":"Prashanth Menon, T. Rabl, Mohammad Sadoghi, H. Jacobsen","doi":"10.1109/ICDE.2014.6816732","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816732","url":null,"abstract":"With the ever growing size and complexity of enterprise systems there is a pressing need for more detailed application performance management. Due to the high data rates, traditional database technology cannot sustain the required performance. Alternatives are the more lightweight and, thus, more performant key-value stores. However, these systems tend to sacrifice read performance in order to obtain the desired write throughput by avoiding random disk access in favor of fast sequential accesses. With the advent of SSDs, built upon the philosophy of no moving parts, the boundary between sequential vs. random access is now becoming blurred. This provides a unique opportunity to extend the storage memory hierarchy using SSDs in key-value stores. In this paper, we extensively evaluate the benefits of using SSDs in commercialized key-value stores. In particular, we investigate the performance of hybrid SSD-HDD systems and demonstrate the benefits of our SSD caching and our novel dynamic schema model.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129450478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
A demonstration of MNTG - A web-based road network traffic generator 一个基于网络的道路网络交通生成器MNTG的演示
2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816752
M. Mokbel, Louai Alarabi, Jie Bao, A. Eldawy, A. Magdy, Mohamed Sarwat, Ethan Waytas, Steven Yackel
{"title":"A demonstration of MNTG - A web-based road network traffic generator","authors":"M. Mokbel, Louai Alarabi, Jie Bao, A. Eldawy, A. Magdy, Mohamed Sarwat, Ethan Waytas, Steven Yackel","doi":"10.1109/ICDE.2014.6816752","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816752","url":null,"abstract":"This demo presents Minnesota Traffic Generator (MNTG); an extensible web-based road network traffic generator. MNTG enables its users to generate traffic data at any arbitrary road networks with different traffic generators. Unlike existing traffic generators that require a lot of time/effort to install, configure, and run, MNTG is a web service with a user-friendly interface where users can specify an arbitrary spatial region, select a traffic generator, and submit their traffic generation request. Once the traffic data is generated by MNTG, users can then download and/or visualize the generated data. MNTG can be extended to support: (1) various traffic generators. It is already shipped with the two most common traffic generators, Brinkhoff and BerlinMOD, but other generators can be easily added. (2) various road network sources. It is shipped with U.S. Tiger files and OpenStreetMap, but other sources can be also added. A beta version of MNTG is launched at: http://mntg.cs.umn.edu.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114681755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Continuous data cleaning 连续数据清理
2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816655
M. Volkovs, Fei Chiang, Jaroslaw Szlichta, Renée J. Miller
{"title":"Continuous data cleaning","authors":"M. Volkovs, Fei Chiang, Jaroslaw Szlichta, Renée J. Miller","doi":"10.1109/ICDE.2014.6816655","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816655","url":null,"abstract":"In declarative data cleaning, data semantics are encoded as constraints and errors arise when the data violates the constraints. Various forms of statistical and logical inference can be used to reason about and repair inconsistencies (errors) in data. Recently, unified approaches that repair both errors in data and errors in semantics (the constraints) have been proposed. However, both data-only approaches and unified approaches are by and large static in that they apply cleaning to a single snapshot of the data and constraints. We introduce a continuous data cleaning framework that can be applied to dynamic data and constraint environments. Our approach permits both the data and its semantics to evolve and suggests repairs based on the accumulated evidence to date. Importantly, our approach uses not only the data and constraints as evidence, but also considers the past repairs chosen and applied by a user (user repair preferences). We introduce a repair classifier that predicts the type of repair needed to resolve an inconsistency, and that learns from past user repair preferences to recommend more accurate repairs in the future. Our evaluation shows that our techniques achieve high prediction accuracy and generate high quality repairs. Of independent interest, our work makes use of a set of data statistics that are shown to be sensitive to predicting particular repair types.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128665913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 101
Text and structured data fusion in data tamer at scale 大规模数据驯服中的文本和结构化数据融合
2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816755
M. Gubanov, M. Stonebraker, D. Bruckner
{"title":"Text and structured data fusion in data tamer at scale","authors":"M. Gubanov, M. Stonebraker, D. Bruckner","doi":"10.1109/ICDE.2014.6816755","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816755","url":null,"abstract":"Large-scale text data research has recently started to regain momentum [1]-[10], because of the wealth of up to date information communicated in unstructured format. For example, new information in online media (e.g. Web blogs, Twitter, Facebook, news feeds, etc) becomes instantly available and is refreshed regularly, has very broad coverage and other valuable properties unusual for other data sources and formats. Therefore, many enterprises and individuals are interested in integrating and using unstructured text in addition to their structured data.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133312473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Memory-efficient centroid decomposition for long time series 记忆效率质心分解为长时间序列
2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816643
Mourad Khayati, Michael H. Böhlen, J. Gamper
{"title":"Memory-efficient centroid decomposition for long time series","authors":"Mourad Khayati, Michael H. Böhlen, J. Gamper","doi":"10.1109/ICDE.2014.6816643","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816643","url":null,"abstract":"Real world applications that deal with time series data often rely on matrix decomposition techniques, such as the Singular Value Decomposition (SVD). The Centroid Decomposition (CD) approximates the Singular Value Decomposition, but does not scale to long time series because of the quadratic space complexity of the sign vector computation. In this paper, we propose a greedy algorithm, termed Scalable Sign Vector (SSV), to efficiently determine sign vectors for CD applications with long time series, i.e., where the number of rows (observations) is much larger than the number of columns (time series). The SSV algorithm starts with a sign vector consisting of only 1s and iteratively changes the sign of the element that maximizes the benefit. The space complexity of the SSV algorithm is linear in the length of the time series. We provide proofs for the scalability, the termination and the correctness of the SSV algorithm. Experiments with real world hydrological time series and data sets from the UCR repository validate the analytical results and show the scalability of SSV.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128045922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
KnowLife: A knowledge graph for health and life sciences KnowLife:健康和生命科学的知识图谱
2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816754
P. Ernst, Cynthia Meng, A. Siu, G. Weikum
{"title":"KnowLife: A knowledge graph for health and life sciences","authors":"P. Ernst, Cynthia Meng, A. Siu, G. Weikum","doi":"10.1109/ICDE.2014.6816754","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816754","url":null,"abstract":"Knowledge bases (KB's) contribute to advances in semantic search, Web analytics, and smart recommendations. Their coverage of domain-specific knowledge is limited, though. This demo presents the KnowLife portal, a large KB for health and life sciences, automatically constructed from Web sources. Prior work on biomedical ontologies has focused on molecular biology: genes, proteins, and pathways. In contrast, KnowLife is a one-stop portal for a much wider range of relations about diseases, symptoms, causes, risk factors, drugs, side effects, and more. Moreover, while most prior work relies on manually curated sources as input, the KnowLife system taps into scientific literature as well as online communities. KnowLife uses advanced information extraction methods to populate the relations in the KB. This way, it learns patterns for relations, which are in turn used to semantically annotate newly seen documents, thus aiding users in “speed-reading”. We demonstrate the value of the KnowLife KB by various use-cases, supporting both layman and professional users.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114483344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 59
Scalable serializable snapshot isolation for multicore systems 多核系统的可伸缩串行快照隔离
2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816693
Hyuck Han, Seongjae Park, Hyungsoo Jung, A. Fekete, Uwe Röhm, H. Yeom
{"title":"Scalable serializable snapshot isolation for multicore systems","authors":"Hyuck Han, Seongjae Park, Hyungsoo Jung, A. Fekete, Uwe Röhm, H. Yeom","doi":"10.1109/ICDE.2014.6816693","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816693","url":null,"abstract":"Since 1990's, Snapshot Isolation (SI) has been widely studied, and it was successfully deployed in commercial and open-source database engines. Berenson et al. showed that data consistency can be violated under SI. Recently, a new class of Serializable SI algorithms (SSI) has been proposed to achieve serializable execution while still allowing concurrency between reads and updates.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"22 6S 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115944977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信