2011 IEEE 27th International Conference on Data Engineering最新文献_第4页

Transactional In-Page Logging for multiversion read consistency and recovery 用于多版本读取一致性和恢复的事务性页内日志记录

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767889

Sang-Won Lee, Bongki Moon

引用次数: 13

NORMS: An automatic tool to perform schema label normalization 规范:执行模式标签规范化的自动工具

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767952

S. Sorrentino, S. Bergamaschi, M. Gawinecki

引用次数: 11

AMC - A framework for modelling and comparing matching systems as matching processes AMC -作为匹配过程建模和比较匹配系统的框架

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767940

E. Peukert, Julian Eberius, E. Rahm

引用次数: 66

RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems RCFile:在基于mapreduce的仓库系统中快速且节省空间的数据放置结构

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767933

Yongqiang He, Rubao Lee, Yin Huai, Zheng Shao, Namit Jain, Xiaodong Zhang, Zhiwei Xu

{"title":"RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems","authors":"Yongqiang He, Rubao Lee, Yin Huai, Zheng Shao, Namit Jain, Xiaodong Zhang, Zhiwei Xu","doi":"10.1109/ICDE.2011.5767933","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767933","url":null,"abstract":"MapReduce-based data warehouse systems are playing important roles of supporting big data analytics to understand quickly the dynamics of user behavior trends and their needs in typical Web service providers and social network sites (e.g., Facebook). In such a system, the data placement structure is a critical factor that can affect the warehouse performance in a fundamental way. Based on our observations and analysis of Facebook production systems, we have characterized four requirements for the data placement structure: (1) fast data loading, (2) fast query processing, (3) highly efficient storage space utilization, and (4) strong adaptivity to highly dynamic workload patterns. We have examined three commonly accepted data placement structures in conventional databases, namely row-stores, column-stores, and hybrid-stores in the context of large data analysis using MapReduce. We show that they are not very suitable for big data processing in distributed systems. In this paper, we present a big data placement structure called RCFile (Record Columnar File) and its implementation in the Hadoop system. With intensive experiments, we show the effectiveness of RCFile in satisfying the four requirements. RCFile has been chosen in Facebook data warehouse system as the default option. It has also been adopted by Hive and Pig, the two most widely used data analysis systems developed in Facebook and Yahoo!","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131274920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 292

Decomposing DAGs into spanning trees: A new way to compress transitive closures 将dag分解为生成树:压缩传递闭包的一种新方法

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767832

Yangjun Chen, Yibin Chen

引用次数: 38

The extensibility framework in Microsoft StreamInsight Microsoft StreamInsight中的可扩展性框架

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767878

Mohamed H. Ali, B. Chandramouli, J. Goldstein, R. Schindlauer

{"title":"The extensibility framework in Microsoft StreamInsight","authors":"Mohamed H. Ali, B. Chandramouli, J. Goldstein, R. Schindlauer","doi":"10.1109/ICDE.2011.5767878","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767878","url":null,"abstract":"Microsoft StreamInsight (StreamInsight, for brevity) is a platform for developing and deploying streaming applications, which need to run continuous queries over high-data-rate streams of input events. StreamInsight leverages a well-defined temporal stream model and operator algebra, as the underlying basis for processing long-running continuous queries over event streams. This allows StreamInsight to handle imperfections in event delivery and to provide correctness guarantees on the generated output. StreamInsight natively supports a diverse range of off-the-shelf streaming operators. In order to cater to a much broader range of customer scenarios and applications, StreamInsight has recently introduced a new extensibility infrastructure. With this infrastructure, StreamInsight enables developers to integrate their domain expertise within the query pipeline in the form of user defined modules (functions, operators, and aggregates). This paper describes the extensibility framework in StreamInsight; an ongoing effort at Microsoft SQL Server to support the integration of user-defined modules in a stream processing system. More specifically, the paper addresses the extensibility problem from three perspectives: the query writer's perspective, the user defined module writer's perspective, and the system's internal perspective. The paper introduces and addresses a range of new and subtle challenges that arise when we try to add extensibility to a streaming system, in a manner that is easy to use, powerful, and practical. We summarize our experience and provide future directions for supporting stream-oriented workloads in different business domains.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117046612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 59

SQPR: Stream query planning with reuse SQPR:具有重用性的流查询规划

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767851

Evangelia Kalyvianaki, W. Wiesemann, Q. Vu, D. Kuhn, P. Pietzuch

{"title":"SQPR: Stream query planning with reuse","authors":"Evangelia Kalyvianaki, W. Wiesemann, Q. Vu, D. Kuhn, P. Pietzuch","doi":"10.1109/ICDE.2011.5767851","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767851","url":null,"abstract":"When users submit new queries to a distributed stream processing system (DSPS), a query planner must allocate physical resources, such as CPU cores, memory and network bandwidth, from a set of hosts to queries. Allocation decisions must provide the correct mix of resources required by queries, while achieving an efficient overall allocation to scale in the number of admitted queries. By exploiting overlap between queries and reusing partial results, a query planner can conserve resources but has to carry out more complex planning decisions. In this paper, we describe SQPR, a query planner that targets DSPSs in data centre environments with heterogeneous resources. SQPR models query admission, allocation and reuse as a single constrained optimisation problem and solves an approximate version to achieve scalability. It prevents individual resources from becoming bottlenecks by re-planning past allocation decisions and supports different allocation objectives. As our experimental evaluation in comparison with a state-of-the-art planner shows SQPR makes efficient resource allocation decisions, even with a high utilisation of resources, with acceptable overheads.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116368209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 54

High performance database logging using storage class memory 使用存储类内存的高性能数据库日志记录

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767918

Ru Fang, Hui-I Hsiao, Bin He, C. Mohan, Yun Wang

{"title":"High performance database logging using storage class memory","authors":"Ru Fang, Hui-I Hsiao, Bin He, C. Mohan, Yun Wang","doi":"10.1109/ICDE.2011.5767918","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767918","url":null,"abstract":"Storage class memory (SCM), a new generation of memory technology, offers non-volatility, high-speed, and byte-addressability, which combines the best properties of current hard disk drives (HDD) and main memory. With these extraordinary features, current systems and software stacks need to be redesigned to get significantly improved performance by eliminating disk input/output (I/O) barriers; and simpler system designs by avoiding complicated data format transformations. In current DBMSs, logging and recovery are the most important components to enforce the atomicity and durability of a database. Traditionally, database systems rely on disks for logging transaction actions and log records are forced to disks when a transaction commits. Because of the slow disk I/O speed, logging becomes one of the major bottlenecks for a DBMS. Exploiting SCM as a persistent memory for transaction logging can significantly reduce logging overhead. In this paper, we present the detailed design of an SCM-based approach for DBMSs logging, which achieves high performance by simplified system design and better concurrency support. We also discuss solutions to tackle several major issues arising during system recovery, including hole detection, partial write detection, and any-point failure recovery. This new logging approach is used to replace the traditional disk based logging approach in DBMSs. To analyze the performance characteristics of our SCM-based logging approach, we implement the prototype on IBM SolidDB. In common circumstances, our experimental results show that the new SCM-based logging approach provides as much as 7 times throughput improvement over disk-based logging in the Telecommunication Application Transaction Processing (TATP) benchmark.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"218 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116430584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 110

Efficient continuously moving top-k spatial keyword query processing 高效的连续移动top-k空间关键字查询处理

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767861

Dingming Wu, Man Lung Yiu, Christian S. Jensen, G. Cong

引用次数: 159

Adapting microsoft SQL server for cloud computing 为云计算调整microsoft SQL server

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767935

P. Bernstein, Istvan Cseri, Nishant Dani, Nigel Ellis, Ajay Kalhan, Gopal Kakivaya, D. Lomet, Ramesh Manne, Lev Novik, Tomas Talius

引用次数: 128