D. Wallom, M. Turilli, Andrew P. Martin, Anbang Ruan, G. Taylor, N. Hargreaves, Alan McMoran
{"title":"myTrustedCloud: Trusted Cloud Infrastructure for Security-critical Computation and Data Managment","authors":"D. Wallom, M. Turilli, Andrew P. Martin, Anbang Ruan, G. Taylor, N. Hargreaves, Alan McMoran","doi":"10.1145/2361999.2362014","DOIUrl":"https://doi.org/10.1145/2361999.2362014","url":null,"abstract":"Cloud Computing provides an optimal infrastructure to utilise and share both computational and data resources whilst allowing a pay-per-use model, useful to cost-effectively manage hardware investment or to maximise its utilisation. Cloud Computing also offers transitory access to scalable amounts of computational resources, something that is particularly important due to the time and financial constraints of many user communities. The growing number of communities that are adopting large public cloud resources such as Amazon Web Services [1] or Microsoft Azure [2] proves the success and hence usefulness of the Cloud Computing paradigm. Nonetheless, the typical use cases for public clouds involve non-business critical applications, particularly where issues around security of utilization of applications or deposited data within shared public services are binding requisites. In this paper, a use case is presented illustrating how the integration of Trusted Computing technologies into an available cloud infrastructure -- Eucalyptus -- allows the security-critical energy industry to exploit the flexibility and potential economical benefits of the Cloud Computing paradigm for their business-critical applications.","PeriodicalId":427190,"journal":{"name":"2011 IEEE Third International Conference on Cloud Computing Technology and Science","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131884725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pedro Costa, Marcelo Pasin, A. Bessani, M. Correia
{"title":"Byzantine Fault-Tolerant MapReduce: Faults are Not Just Crashes","authors":"Pedro Costa, Marcelo Pasin, A. Bessani, M. Correia","doi":"10.1109/CloudCom.2011.15","DOIUrl":"https://doi.org/10.1109/CloudCom.2011.15","url":null,"abstract":"MapReduce is often used to run critical jobs such as scientific data analysis. However, evidence in the literature shows that arbitrary faults do occur and can probably corrupt the results of MapReduce jobs. MapReduce runtimes like Hadoop tolerate crash faults, but not arbitrary or Byzantine faults. We present a MapReduce algorithm and prototype that tolerate these faults. An experimental evaluation shows that the execution of a job with our algorithms uses twice the resources of the original Hadoop, instead of the 3 or 4 times more that would be achieved with the direct application of common Byzantine fault-tolerance paradigms. We believe this cost is acceptable for critical applications that require that level of fault tolerance.","PeriodicalId":427190,"journal":{"name":"2011 IEEE Third International Conference on Cloud Computing Technology and Science","volume":"10878 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115718441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimization-Based Virtual Machine Manager for Private Cloud Computing","authors":"D. Niyato","doi":"10.1109/CloudCom.2011.23","DOIUrl":"https://doi.org/10.1109/CloudCom.2011.23","url":null,"abstract":"In this paper, an optimal resource management framework for cloud computing environment is presented. Based on virtualization technology, the workload to be processed on a virtual machine can be moved (i.e., outsourced) from private cloud (i.e., in-house computer system) to the service provider in public cloud. The framework introduces the virtual machine manager (VMM) in private cloud operating to minimize the cost due to the outsourcing and performance degradation. A stochastic optimization model is developed to obtain an optimal workload outsourcing policy with an objective to minimize a cost. The numerical studies reveal the effectiveness of the optimal resource management framework to achieve an objective of private cloud. This framework will be useful not only to optimize the performance of resource usage, but also to achieve the best benefit from economic perspective of the cloud computing regime.","PeriodicalId":427190,"journal":{"name":"2011 IEEE Third International Conference on Cloud Computing Technology and Science","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123103977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VM Leakage and Orphan Control in Open-Source Clouds","authors":"Christopher E. Dabrowski, K. Mills","doi":"10.1109/CloudCom.2011.84","DOIUrl":"https://doi.org/10.1109/CloudCom.2011.84","url":null,"abstract":"Computer systems often exhibit degraded performance due to resource leakage caused by erroneous programming or malicious attacks, and computers can even crash in extreme cases of resource exhaustion. The advent of cloud computing provides increased opportunities to amplify such vulnerabilities, thus affecting a significant number of computer users. Using simulation, we demonstrate that cloud computing systems based on open-source code could be subjected to a simple malicious attack capable of degrading availability of virtual machines (VMs). We describe how the attack leads to VM leakage, causing orphaned VMs to accumulate over time, reducing the pool of resources available to users. We identify a set of orphan control processes needed in multiple cloud components, and we illustrate how such processes detect and eliminate orphaned VMs. We show that adding orphan control allows an open-source cloud to sustain a higher level of VM availability during malicious attacks. We also report on the overhead of implementing orphan control.","PeriodicalId":427190,"journal":{"name":"2011 IEEE Third International Conference on Cloud Computing Technology and Science","volume":"47 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120893594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A-Tree: Distributed Indexing of Multidimensional Data for Cloud Computing Environments","authors":"A. Papadopoulos, Dimitrios Katsaros","doi":"10.1109/CloudCom.2011.61","DOIUrl":"https://doi.org/10.1109/CloudCom.2011.61","url":null,"abstract":"Efficient querying of huge volumes of multidimensional data stored in cloud computing systems has become a necessity, due to the widespread of cloud storage facilities. With clouds getting larger and available data growing larger and larger it is mandatory to develop fast, scalable and efficient indexing schemes. In this paper, we propose the A-tree, a distributed indexing scheme for multidimensional data capable of handling both point and range queries, appropriate for cloud computing environments. A performance evaluation of the A-tree against the state-of-the-art competitor attests its superiority, achieving significantly lower latencies.","PeriodicalId":427190,"journal":{"name":"2011 IEEE Third International Conference on Cloud Computing Technology and Science","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115107128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Django Armstrong, K. Djemame, S. Nair, Johan Tordsson, W. Ziegler
{"title":"Towards a Contextualization Solution for Cloud Platform Services","authors":"Django Armstrong, K. Djemame, S. Nair, Johan Tordsson, W. Ziegler","doi":"10.1109/CLOUDCOM.2011.51","DOIUrl":"https://doi.org/10.1109/CLOUDCOM.2011.51","url":null,"abstract":"We propose a cloud contextualization mechanism which operates in two stages, contextualization of VM images prior to service deployment (PaaS level) and self-contextualization of VM instances created from the image (IaaS level). The contextualization tools are implemented as part of the OPTIMIS Toolkit, a set of software components for simplified management of cloud services and infrastructures. We present the architecture of our contextualization tools and the feasibility of our contextualization mechanism is demonstrated in a three-tier web application scenario. Preliminary performance results suggest acceptable performance and scalability","PeriodicalId":427190,"journal":{"name":"2011 IEEE Third International Conference on Cloud Computing Technology and Science","volume":"359 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122554240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Front-end, Hadoop-based Data Management Service for Efficient Federated Clouds","authors":"George Kousiouris, G. Vafiadis, T. Varvarigou","doi":"10.1109/CloudCom.2011.76","DOIUrl":"https://doi.org/10.1109/CloudCom.2011.76","url":null,"abstract":"In the recent years, cloud computing has emerged as the new IT paradigm that promises elastic resources on a pay-per-use basis. The challenges of cloud computing are focused around massive data storage and efficient large scale distributed computation. Hadoop, a community driven Apache project has provided an efficient and cost effective platform for large scale computation using the map-reduce methodology, pioneered by Google. In this paper, the design of a Hadoop-based data management system as the front-end service for Cloud data management is investigated. This framework is enriched with Restful APIs in front of Hadoop and a series of components that aim to extend Hadoop's functionality beyond its well known back-end, heavy data processing scope. These components are used to enrich security, logging and data analysis features and also data access compatibility between different but interconnected Cloud providers (federated Clouds). Hadoop capabilities are also extended in a quest for intelligent decision making regarding the choice of the fittest services for federation in a federated cloud scenario, in addition to legally compliant behaviour regarding the geographical location of data storage.","PeriodicalId":427190,"journal":{"name":"2011 IEEE Third International Conference on Cloud Computing Technology and Science","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116673860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"iHadoop: Asynchronous Iterations for MapReduce","authors":"Eslam Elnikety, T. Elsayed, Hany E. Ramadan","doi":"10.1109/CloudCom.2011.21","DOIUrl":"https://doi.org/10.1109/CloudCom.2011.21","url":null,"abstract":"MapReduce is a distributed programming framework designed to ease the development of scalable data-intensive applications for large clusters of commodity machines. Most machine learning and data mining applications involve iterative computations over large datasets, such as the Web hyperlink structures and social network graphs. Yet, the MapReduce model does not efficiently support this important class of applications. The architecture of MapReduce, most critically its dataflow techniques and task scheduling, is completely unaware of the nature of iterative applications, tasks are scheduled according to a policy that optimizes the execution for a single iteration which wastes bandwidth, I/O, and CPU cycles when compared with an optimal execution for a consecutive set of iterations. This work presents iHadoop, a modified MapReduce model, and an associated implementation, optimized for iterative computations. The iHadoop model schedules iterations asynchronously. It connects the output of one iteration to the next, allowing both to process their data concurrently. iHadoop's task scheduler exploits inter-iteration data locality by scheduling tasks that exhibit a producer/consumer relation on the same physical machine allowing a fast local data transfer. For those iterative applications that require satisfying certain criteria before termination, iHadoop runs the check concurrently during the execution of the subsequent iteration to further reduce the application's latency. This paper also describes our implementation of the iHadoop model, and evaluates its performance against Hadoop, the widely used open source implementation of MapReduce. Experiments using different data analysis applications over real-world and synthetic datasets show that iHadoop performs better than Hadoop for iterative algorithms, reducing execution time of iterative applications by 25% on average. Furthermore, integrating iHadoop with HaLoop, a variant Hadoop implementation that caches invariant data between iterations, reduces execution time by 38% on average.","PeriodicalId":427190,"journal":{"name":"2011 IEEE Third International Conference on Cloud Computing Technology and Science","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117009805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Reproducible eScience in the Cloud","authors":"Jonathan Klinginsmith, M. Mahoui, Yuqing Wu","doi":"10.1109/CloudCom.2011.89","DOIUrl":"https://doi.org/10.1109/CloudCom.2011.89","url":null,"abstract":"Whether it be data from ubiquitous devices such as sensors or data generated from telescopes or other laboratory instruments, technology apparent in many scientific disciplines is generating data at rates never witnessed before. Computational scientists are among the many who perform inductive experiments and analyses on these data with the goal of answering scientific questions. These computationally demanding experiments and analyses have become a common occurrence, resulting in a shift in scientific discovery, and thus leading to the term eScience. To perform eScience experiments and analysis at scale, one must have an infrastructure with enough computing power and storage space. The advent of cloud computing has allowed infrastructures and platforms to be created with theoretical limitless bounds, thus providing an attractive solution to this need. In this work, we create a reproducible process for the construction of eScience computing environments on top of cloud computing infrastructures. Our solution separates the construction of these environments into two distinct layers: (1) the infrastructure layer and (2) the software layer. We provide results of running our framework on two different computational clusters within two separate cloud computing environments to demonstrate that our framework can facilitate the replication or extension of an eScience experiment.","PeriodicalId":427190,"journal":{"name":"2011 IEEE Third International Conference on Cloud Computing Technology and Science","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129531614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Mahout for Clustering Wikipedia's Latest Articles: A Comparison between K-means and Fuzzy C-means in the Cloud","authors":"R. Esteves, Chunming Rong","doi":"10.1109/CLOUDCOM.2011.86","DOIUrl":"https://doi.org/10.1109/CLOUDCOM.2011.86","url":null,"abstract":"This paper compares k-means and fuzzy c-means for clustering a noisy realistic and big dataset. We made the comparison using a free cloud computing solution Apache Mahout/ Hadoop and Wikipedia's latest articles. In the past the usage of these two algorithms was restricted to small datasets. As so, studies were based on artificial datasets that do not represent a real document clustering situation. With this ongoing research we found that in a noisy dataset, fuzzy c-means can lead to worse cluster quality than k-means. The convergence speed of k-means is not always faster. We found as well that Mahout is a promise clustering technology but the preprocessing tools are not developed enough for an efficient dimensionality reduction. From our experience the use of the Apache Mahout is premature.","PeriodicalId":427190,"journal":{"name":"2011 IEEE Third International Conference on Cloud Computing Technology and Science","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128427604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}