Large Scale and Big Data最新文献_第2页

PEGASUS: A System for Large-Scale Graph Processing PEGASUS:一个大规模图形处理系统

Large Scale and Big Data Pub Date : 1900-01-01 DOI: 10.1201/b17112-9

Charalampos E. Tsourakakis

引用次数: 16

An Overview of the NoSQL World NoSQL世界概览

Large Scale and Big Data Pub Date : 1900-01-01 DOI: 10.1201/b17112-10

Liang Zhao, S. Sakr, Anna Liu

引用次数: 0

Virtualizing Resources for the Cloud 虚拟化云资源

Large Scale and Big Data Pub Date : 1900-01-01 DOI: 10.1201/b17112-17

Mohammad Hammoud, M. Sakr

{"title":"Virtualizing Resources for the Cloud","authors":"Mohammad Hammoud, M. Sakr","doi":"10.1201/b17112-17","DOIUrl":"https://doi.org/10.1201/b17112-17","url":null,"abstract":"Virtualization is at the core of cloud computing. It lies on top of the cloud infrastructure, whereby virtual resources (e.g., virtual CPUs, memories, disks and networks) are constructed from the underlying physical resources and act as proxies to them. As is the case with the idea of cloud computing, which was first introduced in the 1960s [1], virtualization can be traced back to the 1970s [55]. Forty years ago, the mainframe computer systems were extremely large and expensive. To address expanding user needs and costly machine ownerships, the IBM 370 architecture, announced in 1970, offered complete virtual machines (virtual hardware images) to different programs running at the same computer hardware. Over time, computer hardware became less expensive and users started migrating to low-priced desktop machines. This drove the adoption of the virtualization technology to fade for a while. Today, virtualization is enjoying a resurgence in popularity with a number of research projects and commercial systems providing virtualization solutions for commodity PCs, servers, and the cloud. In this chapter, we present various ingredients of the virtualization technology and the crucial role it plays in enabling the cloud computing paradigm. First, we identify major reasons for why virtualization is becoming important, especially for the cloud. Second, we indicate how multiple software images can run side-by-side on physical resources while attaining security, resource and failure isolations. Prior to delving into more details about virtualization, we present a brief background requisite for understanding how physical resources can be virtualized. In particular,","PeriodicalId":448182,"journal":{"name":"Large Scale and Big Data","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121603616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Performance Analysis for Large IaaS Clouds 大型IaaS云的性能分析

Large Scale and Big Data Pub Date : 1900-01-01 DOI: 10.1201/b17112-19

R. Ghosh, F. Longo, Kishor S. Trivedi

引用次数: 0

MapReduce Family of Large-Scale Data-Processing Systems MapReduce系列大规模数据处理系统

Large Scale and Big Data Pub Date : 1900-01-01 DOI: 10.1201/b17112-3

S. Sakr, Anna Liu, A. Fayoumi

引用次数: 1

Consistency Management in Cloud Storage Systems 云存储一致性管理

Large Scale and Big Data Pub Date : 1900-01-01 DOI: 10.1201/b17112-11

Houssem-Eddine Chihoub, Shadi Ibrahim, Gabriel Antoniu, María S. Pérez

{"title":"Consistency Management in Cloud Storage Systems","authors":"Houssem-Eddine Chihoub, Shadi Ibrahim, Gabriel Antoniu, María S. Pérez","doi":"10.1201/b17112-11","DOIUrl":"https://doi.org/10.1201/b17112-11","url":null,"abstract":"With the emergence of cloud computing, many organizations have moved their data to the cloud in order to provide scalable, reliable and high available services. As these services mainly rely on geographically-distributed data replication to guarantee good performance and high availability, consistency comes into question. The CAP theorem discusses tradeoffs between consistency, availability, and partition tolerance, and concludes that only two of these three properties can be guaranteed simultaneously in replicated storage systems. With data growing in size and systems growing in scale, new tradeoffs have been introduced and new models are emerging for maintaining data consistency. In this chapter, we discuss the consistency issue and describe the CAP theorem as well as its limitations and impacts on big data management in large scale systems. We then briefly introduce several models of consistency in cloud storage systems. Then, we study some state-of-the-art cloud storage systems from both enterprise and academia, and discuss their contribution to maintaining data consistency. To complete our chapter, we introduce the current trend toward adaptive consistency in big data systems and introduce our dynamic adaptive consistency solution (Harmony). We conclude by discussing the open issues and challenges raised regarding consistency in the cloud.","PeriodicalId":448182,"journal":{"name":"Large Scale and Big Data","volume":"366 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125625042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Advanced Algorithms for Efficient Approximate Duplicate Detection in Data Streams Using Bloom Filters 使用Bloom过滤器在数据流中高效近似重复检测的高级算法

Large Scale and Big Data Pub Date : 1900-01-01 DOI: 10.1201/b17112-14

Sourav Dutta, A. Narang

引用次数: 0

Incremental MapReduce Computations 增量MapReduce计算

Large Scale and Big Data Pub Date : 1900-01-01 DOI: 10.1201/b17112-5

Pramod Bhatotia, Alexander Wieder, Umut A. Acar, R. Rodrigues

{"title":"Incremental MapReduce Computations","authors":"Pramod Bhatotia, Alexander Wieder, Umut A. Acar, R. Rodrigues","doi":"10.1201/b17112-5","DOIUrl":"https://doi.org/10.1201/b17112-5","url":null,"abstract":"Abstract Distributed processing of large data sets is an area that received much attention from researchers and practitioners over the last few years. In this context, several proposals exist that leverage the observation that data sets evolve over time, and as such there is often a substantial overlap between the input to consecutive runs of a data processing job. This allows the programmers of these systems to devise an e ﬃ cient logic to update the output upon an input change. However, most of these systems lack compatibility existing models and require the programmer to implement an application-speciﬁc dynamic algorithm, which increases algorithm and code complexity. In this chapter, we describe our previous work on building a platform called Incoop, which allows for running MapReduce computations incrementally and transparently. Incoop detects changes between two ﬁles that are used as inputs to consecutive MapReduce jobs, and e ﬃ ciently propagates those changes until the new output is produced. The design of Incoop is based on memoizing the results of previously run tasks, and reusing these results whenever possible. Doing this e ﬃ ciently introduces several technical challenges that are overcome with novel concepts, such as a large-scale storage system that e ﬃ ciently computes deltas between two inputs, a Contraction phase to break up the work of the Reduce phase, and an a ﬃ nity-based scheduling algorithm. This chapter presents the motivation and design of Incoop, as well as a complete evaluation using several application benchmarks. Our results show signiﬁcant performance improvements without changing a single line of application code.","PeriodicalId":448182,"journal":{"name":"Large Scale and Big Data","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123112337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Large-Scale RDF Processing with MapReduce MapReduce的大规模RDF处理

Large Scale and Big Data Pub Date : 1900-01-01 DOI: 10.1201/b17112-6

A. Schätzle, Martin Przyjaciel-Zablocki, Thomas Hornung, G. Lausen

引用次数: 1