{"title":"PEGASUS: A System for Large-Scale Graph Processing","authors":"Charalampos E. Tsourakakis","doi":"10.1201/b17112-9","DOIUrl":"https://doi.org/10.1201/b17112-9","url":null,"abstract":"","PeriodicalId":448182,"journal":{"name":"Large Scale and Big Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127240563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Overview of the NoSQL World","authors":"Liang Zhao, S. Sakr, Anna Liu","doi":"10.1201/b17112-10","DOIUrl":"https://doi.org/10.1201/b17112-10","url":null,"abstract":"","PeriodicalId":448182,"journal":{"name":"Large Scale and Big Data","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130812478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance Analysis for Large IaaS Clouds","authors":"R. Ghosh, F. Longo, Kishor S. Trivedi","doi":"10.1201/b17112-19","DOIUrl":"https://doi.org/10.1201/b17112-19","url":null,"abstract":"IaaS clouds are major enablers of data-intensive cloud applications because they provide necessary computing capacity for managing Big Data environments. In a typical IaaS cloud, virtual machine (VM) instances deployed on physical machines (PM) are provided to the users for their computing needs. Recently, IaaS cloud providers are realizing that merely providing the basic functionalities for Big Data processing is not sufficient to survive intense business competitions. Rather, the performance of the cloud provided service is an equally important factor when a CONTENTS","PeriodicalId":448182,"journal":{"name":"Large Scale and Big Data","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126147491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Virtualizing Resources for the Cloud","authors":"Mohammad Hammoud, M. Sakr","doi":"10.1201/b17112-17","DOIUrl":"https://doi.org/10.1201/b17112-17","url":null,"abstract":"Virtualization is at the core of cloud computing. It lies on top of the cloud infrastructure, whereby virtual resources (e.g., virtual CPUs, memories, disks and networks) are constructed from the underlying physical resources and act as proxies to them. As is the case with the idea of cloud computing, which was first introduced in the 1960s [1], virtualization can be traced back to the 1970s [55]. Forty years ago, the mainframe computer systems were extremely large and expensive. To address expanding user needs and costly machine ownerships, the IBM 370 architecture, announced in 1970, offered complete virtual machines (virtual hardware images) to different programs running at the same computer hardware. Over time, computer hardware became less expensive and users started migrating to low-priced desktop machines. This drove the adoption of the virtualization technology to fade for a while. Today, virtualization is enjoying a resurgence in popularity with a number of research projects and commercial systems providing virtualization solutions for commodity PCs, servers, and the cloud. In this chapter, we present various ingredients of the virtualization technology and the crucial role it plays in enabling the cloud computing paradigm. First, we identify major reasons for why virtualization is becoming important, especially for the cloud. Second, we indicate how multiple software images can run side-by-side on physical resources while attaining security, resource and failure isolations. Prior to delving into more details about virtualization, we present a brief background requisite for understanding how physical resources can be virtualized. In particular,","PeriodicalId":448182,"journal":{"name":"Large Scale and Big Data","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121603616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advanced Algorithms for Efficient Approximate Duplicate Detection in Data Streams Using Bloom Filters","authors":"Sourav Dutta, A. Narang","doi":"10.1201/b17112-14","DOIUrl":"https://doi.org/10.1201/b17112-14","url":null,"abstract":"","PeriodicalId":448182,"journal":{"name":"Large Scale and Big Data","volume":"106 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127457766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Houssem-Eddine Chihoub, Shadi Ibrahim, Gabriel Antoniu, María S. Pérez
{"title":"Consistency Management in Cloud Storage Systems","authors":"Houssem-Eddine Chihoub, Shadi Ibrahim, Gabriel Antoniu, María S. Pérez","doi":"10.1201/b17112-11","DOIUrl":"https://doi.org/10.1201/b17112-11","url":null,"abstract":"With the emergence of cloud computing, many organizations have moved their data to the cloud in order to provide scalable, reliable and high available services. As these services mainly rely on geographically-distributed data replication to guarantee good performance and high availability, consistency comes into question. The CAP theorem discusses tradeoffs between consistency, availability, and partition tolerance, and concludes that only two of these three properties can be guaranteed simultaneously in replicated storage systems. With data growing in size and systems growing in scale, new tradeoffs have been introduced and new models are emerging for maintaining data consistency. In this chapter, we discuss the consistency issue and describe the CAP theorem as well as its limitations and impacts on big data management in large scale systems. We then briefly introduce several models of consistency in cloud storage systems. Then, we study some state-of-the-art cloud storage systems from both enterprise and academia, and discuss their contribution to maintaining data consistency. To complete our chapter, we introduce the current trend toward adaptive consistency in big data systems and introduce our dynamic adaptive consistency solution (Harmony). We conclude by discussing the open issues and challenges raised regarding consistency in the cloud.","PeriodicalId":448182,"journal":{"name":"Large Scale and Big Data","volume":"366 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125625042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MapReduce Family of Large-Scale Data-Processing Systems","authors":"S. Sakr, Anna Liu, A. Fayoumi","doi":"10.1201/b17112-3","DOIUrl":"https://doi.org/10.1201/b17112-3","url":null,"abstract":"","PeriodicalId":448182,"journal":{"name":"Large Scale and Big Data","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114325057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pramod Bhatotia, Alexander Wieder, Umut A. Acar, R. Rodrigues
{"title":"Incremental MapReduce Computations","authors":"Pramod Bhatotia, Alexander Wieder, Umut A. Acar, R. Rodrigues","doi":"10.1201/b17112-5","DOIUrl":"https://doi.org/10.1201/b17112-5","url":null,"abstract":"Abstract Distributed processing of large data sets is an area that received much attention from researchers and practitioners over the last few years. In this context, several proposals exist that leverage the observation that data sets evolve over time, and as such there is often a substantial overlap between the input to consecutive runs of a data processing job. This allows the programmers of these systems to devise an e ffi cient logic to update the output upon an input change. However, most of these systems lack compatibility existing models and require the programmer to implement an application-specific dynamic algorithm, which increases algorithm and code complexity. In this chapter, we describe our previous work on building a platform called Incoop, which allows for running MapReduce computations incrementally and transparently. Incoop detects changes between two files that are used as inputs to consecutive MapReduce jobs, and e ffi ciently propagates those changes until the new output is produced. The design of Incoop is based on memoizing the results of previously run tasks, and reusing these results whenever possible. Doing this e ffi ciently introduces several technical challenges that are overcome with novel concepts, such as a large-scale storage system that e ffi ciently computes deltas between two inputs, a Contraction phase to break up the work of the Reduce phase, and an a ffi nity-based scheduling algorithm. This chapter presents the motivation and design of Incoop, as well as a complete evaluation using several application benchmarks. Our results show significant performance improvements without changing a single line of application code.","PeriodicalId":448182,"journal":{"name":"Large Scale and Big Data","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123112337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Schätzle, Martin Przyjaciel-Zablocki, Thomas Hornung, G. Lausen
{"title":"Large-Scale RDF Processing with MapReduce","authors":"A. Schätzle, Martin Przyjaciel-Zablocki, Thomas Hornung, G. Lausen","doi":"10.1201/b17112-6","DOIUrl":"https://doi.org/10.1201/b17112-6","url":null,"abstract":"","PeriodicalId":448182,"journal":{"name":"Large Scale and Big Data","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129396523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}