{"title":"HDFSbench: Understanding the Efficiency and Bottlenecks of Cloud File Systems","authors":"J. Dai, T. Xie, Shengsheng Huang, Jie Huang","doi":"10.1109/OCS.2012.31","DOIUrl":"https://doi.org/10.1109/OCS.2012.31","url":null,"abstract":"We have conducted intensive experiments on an in-house Hadoop cluster using HDFSbench (a file system benchmark tool we build for HDFS). Our experimental results provide valuable insights into the performance characteristics (e.g., general efficiency and potential bottlenecks) of cloud file systems for different application usages (e.g., MapReduce and Bigtable access patterns), and on how these traits change with new storage technologies (e.g., SSD vs. HDD).","PeriodicalId":244833,"journal":{"name":"2012 7th Open Cirrus Summit","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115452030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hadoop Applications in Bioinformatics","authors":"Li Xubin, Wen-Rui Jiang, Jiang Yi, Zou Quan","doi":"10.1109/OCS.2012.40","DOIUrl":"https://doi.org/10.1109/OCS.2012.40","url":null,"abstract":"Bioinformatics is in a dilemma that traditional analysis tools work hard on the large-scale data from the high-throughout sequencing. In recent years, the open source Apache Hadoop project, which adopts MapReduce framework and distributed file system, brings bioinformatics researchers opportunities to obtain a scalable, efficient and reliable computing performance on Linux clusters and Cloud Computing Service. In this paper, we present Hadoop-based applications employed in bioinformatics, covering next-generation sequencing and other biological domains. In addition, we discuss obstacles and future works about Hadoop in bioinformatics.","PeriodicalId":244833,"journal":{"name":"2012 7th Open Cirrus Summit","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115175356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Xerxes: Distributed Load Generator for Cloud-scale Experimentation","authors":"M. Kesavan, Ada Gavrilovska, K. Schwan","doi":"10.1109/OCS.2012.34","DOIUrl":"https://doi.org/10.1109/OCS.2012.34","url":null,"abstract":"With the growing acceptance of cloud computing as a viable computing paradigm, a number of research and real-life-dynamic cloud-scale resource allocation and management systems have been developed over the last few years. An important problem facing system developers is the evaluation of such systems at scale. In this paper we present the design of a distributed load generation framework, Xerxes, that can generate appropriate resource load patterns across varying data center scales, thereby representing various cloud load scenarios. Toward this end, we first characterize the resource consumption of four distributed cloud applications that represent some of the most widely used classes of applications in the cloud. We then demonstrate how, using Xerxes, these patterns can be directly replayed at scale, potentially even beyond what is easily achievable through application reconfiguration. Furthermore, Xerxes allows for additional parameter manipulation and exploration of a wide range of load scenarios. Finally, we demonstrate the ability to use Xerxes with publicly available data center traces which can be replayed across data centers with different configurations. Our experiments are conducted on a 700-node 2800-core private cloud data center, virtualized with the VMware vSphere virtualization stack. The benefits of such a microbenchmark for cloud-scale experimentation include: (i) decoupling load scaling from application logic, (ii) resilience to faults and failures, since applications tend to crash altogether when some components fail,particularly at scales, and (iii) ease of testing and the ability to understand system behavior in a variety of actual or anticipated scenarios.","PeriodicalId":244833,"journal":{"name":"2012 7th Open Cirrus Summit","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132300918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Cook, D. Milojicic, R. Kaufmann, Joel R. Sevinsky
{"title":"N3phele: Open Science-as-a-Service Workbench for Cloud-based Scientific Computing","authors":"N. Cook, D. Milojicic, R. Kaufmann, Joel R. Sevinsky","doi":"10.1109/OCS.2012.30","DOIUrl":"https://doi.org/10.1109/OCS.2012.30","url":null,"abstract":"Because of inexpensive, on-demand resources, Cloud computing is a promising platform for scientific HPC applications, such as gene sequencing. However, it also poses challenges to users and developers in terms of running and maintaining HPC applications which is low-level and complex for scientists. This impacts reusability and reproducibility of the work and increases the cost of development and maintenance. N3phele is a cloud-based workbench that allows researchers to perform complex analysis using only browser and resources in infrastructure clouds, which are orchestrated by n3phele. Individual scientists may publish tools and workflow pipelines, registering them in n3phele for their own private or public collaborator use. To illustrate, the QIIME microbial community analysis toolkit has been registered into n3phele, and n3phele used to perform microbial analysis, including computationally intensive Roche 454 denoising, using Amazon EC2 and n3phele's point and click interface. N3phele substantially improves usability and manageability of complex scientific analysis pipelines in the cloud.","PeriodicalId":244833,"journal":{"name":"2012 7th Open Cirrus Summit","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117120687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Bao, Zhigang Wang, Qiushi Bai, Yu Gu, Ge Yu, Hongxu Zhang, Chao Deng, Leitao Guo
{"title":"BC-BSP: A BSP-Based System with Disk Cache for Large-Scale Graph Processing","authors":"Y. Bao, Zhigang Wang, Qiushi Bai, Yu Gu, Ge Yu, Hongxu Zhang, Chao Deng, Leitao Guo","doi":"10.1109/OCS.2012.37","DOIUrl":"https://doi.org/10.1109/OCS.2012.37","url":null,"abstract":"Many applications in real life can be modeled by Graph, and the data scale is very large in many fields. People have paid more attention to large-scale graph processing. A BSP-based system with disk cache for large-scale graph processing is proposed in this paper. The system has the ability to expand the functions and strategies (such as adjusting the parameters according to the volume of data and supporting multiple aggregation functions at the same time), to process large-scale data, to balance load, and to run clustering or classification algorithms on metric datasets. Some experiments are done to evaluate the scalability of the system implemented in the paper, and the comparison between BC-BSP-based applications and MapReduce-based ones are made. The experimental results show that BSP-based applications have higher efficiency than the MapReduce-based applications when the volume of data can be put in the memory during the course of processing; on the contrary the latter is better than the former.","PeriodicalId":244833,"journal":{"name":"2012 7th Open Cirrus Summit","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122517484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"WarpNet: A Novel Network for IaaS Data Center","authors":"Lv Qi, Dai Jinquan","doi":"10.1109/OCS.2012.35","DOIUrl":"https://doi.org/10.1109/OCS.2012.35","url":null,"abstract":"Data-intensive processing (such as map-reduce) frameworks are increasingly being used in the cloud computing environment, which makes it critical for IaaS (Infrastructure as a Service) framework to support All-to-All communication are very large scale. This paper presents a novel network model, WarpNet, which combine the idea of small world and IP over IP based micro-router. This network model helps build appropriate network infrastructure for IaaS, using low-cost network devices while providing great extensibility, automatic fault tolerance and high performance at the same time. This paper analyzes the theoretical nature of WarpNet, presents the real-world implementation and deployment of the network model, and provides the verification results at large scale.","PeriodicalId":244833,"journal":{"name":"2012 7th Open Cirrus Summit","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123817503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DR3: Optimizing Site Selection for Global Load Balance in Application Delivery Controller","authors":"Qunyang Lin, Junqing Xie, Zhiyong Shen, Xunteng Xu","doi":"10.1109/OCS.2012.32","DOIUrl":"https://doi.org/10.1109/OCS.2012.32","url":null,"abstract":"Site selection method determines the best server for a given client among multiple replicated servers deployed at geographically distributed locations. It is an important function of a global load balancing (GLB) system in application delivery controller (ADC) which is usually deployed at the entry of data centers. This paper presents a site selection method based on DNS Reply Race and Reflection (DR3), which achieves higher selection accuracy and incurs less system cost compared with conventional methods as Geo-mapping, Ping and DNS reply race.","PeriodicalId":244833,"journal":{"name":"2012 7th Open Cirrus Summit","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130931093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design and Implementation of Parallel Term Contribution Algorithm Based on Mapreduce Model","authors":"Peng Chao, Wu Bin, Deng Chao","doi":"10.1109/OCS.2012.39","DOIUrl":"https://doi.org/10.1109/OCS.2012.39","url":null,"abstract":"MapReduce is a software framework introduced by Google in 2004 to support distributed computing on large datasets on clusters of computers[1]. The term contribution (TC) algorithm is a relatively new algorithm in text mining to select features for clustering. In this paper, we design and implement a parallel term contribution (PTC) algorithm based on MapReduce model. By experiment, we come to the conclusion that the performance of TC is greatly enhanced using MapReduce framework.","PeriodicalId":244833,"journal":{"name":"2012 7th Open Cirrus Summit","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132982847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Image Cache in IaaS Cloud","authors":"Zhihong Zhang, Wei Zhou, Ling Qian, Zhiguo Luo, Shaoling Sun, Xiaoqing Huang","doi":"10.1109/OCS.2012.33","DOIUrl":"https://doi.org/10.1109/OCS.2012.33","url":null,"abstract":"Infrastructure-as-a-service (IaaS) is one of several main styles of Cloud Computing, in which virtualized computing resources are managed as a pool and provisioned on demand. When virtual machines are provisioned, a big amount of time is spent on image transfer from central image repository to target nodes. In this paper, we present an image cache mechanism, which can greatly reduce the provision time and alleviate the I/O pressures to central image repository. In same time, it can be transparent to scheduling mechanism of existing IaaS management systems.","PeriodicalId":244833,"journal":{"name":"2012 7th Open Cirrus Summit","volume":"261 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121952605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CloudMaster: A Cloud Computing Management System of BigCloud","authors":"Zhihong Zhang, Wei Zhou, Ling Qian, Zhiguo Luo, Shaoling Sun, Xiaoqing Huang","doi":"10.1109/OCS.2012.38","DOIUrl":"https://doi.org/10.1109/OCS.2012.38","url":null,"abstract":"Cloud Computing is rapidly becoming a critical technique to transform a large part of IT industry. China Mobile Communications Corporation, the biggest telecom company in China, launched BigCloud project to make research on cloud-based IT infrastructure, platform and applications. This paper introduces CloudMaster, the cloud management component of Bigcloud, that deploys, monitors and maintains BigCloud platform in more than 1000 PC servers. With this toolkit's support, only one person can maintian this big scale infrastructure easily. CloudMaster's design principles, architecture, and functions will be presented and management experiences will be shared in this paper.","PeriodicalId":244833,"journal":{"name":"2012 7th Open Cirrus Summit","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122995344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}