Xinyu Que, Weikuan Yu, V. Tipparaju, J. Vetter, Bin Wang
{"title":"Network-Friendly One-Sided Communication through Multinode Cooperation on Petascale Cray XT5 Systems","authors":"Xinyu Que, Weikuan Yu, V. Tipparaju, J. Vetter, Bin Wang","doi":"10.1109/CCGrid.2011.62","DOIUrl":"https://doi.org/10.1109/CCGrid.2011.62","url":null,"abstract":"One-sided communication is important to enable asynchronous communication and data movement for Global Address Space (GAS) programming models. Such communication is typically realized through direct messages between initiator and target processes. For peta scale systems with 10,000s of nodes and 100,000s of cores, these direct messages require dedicated communication buffers and/or channels, which can lead to significant scalability challenges for GAS programming models. In this paper, we describe a network-friendly communication model, multinode cooperation, to enable indirect one-sided communication. Compute nodes work together to handle one-side requests through (1) request forwarding in which one node can intercept a request and forward it to a target node, and (2) request aggregation in which one node can aggregate many requests to a target node. We have implemented multinode cooperation for a popular GAS runtime library, Aggregate Remote Memory Copy Interface (ARMCI). Our experimental results on a large scale Cray XT5 system demonstrate that multinode cooperationis able to greatly increase memory scalability by reducing communication buffers required on each node. In addition, multinode cooperation improves the resiliency of GAS runtime system to network contention. Furthermore, multinode cooperation can benefit the performance of scientific applications. In one case, it reduces the total execution time of an NWChem application by 52%.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133647278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sangho Yi, E. Jeannot, Derrick Kondo, David P. Anderson
{"title":"Towards Real-Time, Volunteer Distributed Computing","authors":"Sangho Yi, E. Jeannot, Derrick Kondo, David P. Anderson","doi":"10.1109/CCGrid.2011.54","DOIUrl":"https://doi.org/10.1109/CCGrid.2011.54","url":null,"abstract":"Many large-scale distributed computing applications demand real-time responses by soft deadlines. To enable such real-time task distribution and execution on the volunteer resources, we previously proposed the design of the real-time volunteer computing platform called RT-BOINC. The system gives low O(1) worst-case execution time for task management operations, such as task scheduling, state transitioning, and validation. In this work, we present a full implementation RT-BOINC, adding new features including deadline timer and parameter-based admission control. We evaluate RT-BOINC at large scale using two real-time applications, namely, the games Go and Chess. The results of our case study show that RT-BOINC provides much better performance than the original BOINC in terms of average and worst-case response time, scalability and efficiency.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132042537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christian Pinto, Shivani Raghav, A. Marongiu, M. Ruggiero, David Atienza Alonso, Luca Benini
{"title":"GPGPU-Accelerated Parallel and Fast Simulation of Thousand-Core Platforms","authors":"Christian Pinto, Shivani Raghav, A. Marongiu, M. Ruggiero, David Atienza Alonso, Luca Benini","doi":"10.1109/CCGRID.2011.64","DOIUrl":"https://doi.org/10.1109/CCGRID.2011.64","url":null,"abstract":"The multicore revolution and the ever-increasing complexity of computing systems is dramatically changing sys-tem design, analysis and programming of computing platforms. Future architectures will feature hundreds to thousands of simple processors and on-chip memories connected through a network-on-chip. Architectural simulators will remain primary tools for design space exploration, software development and performance evaluation of these massively parallel architectures. However, architectural simulation performance is a serious concern, as virtual platforms and simulation technology are not able to tackle the complexity of thousands of core future scenarios. The main contribution of this paper is the development of a new simulation approach and technology for many core processors which exploit the enormous parallel processing capability of low-cost and widely available General Purpose Graphic Processing Units (GPGPU). The simulation of many-core architectures exhibits indeed a high level of parallelism and is inherently parallelizable, but GPGPU acceleration of architectural simulation requires an in-depth revision of the data structures and functional partitioning traditionally used in parallel simulation. We demonstrate our GPGPU simulator on a target architecture composed by several cores (i.e. ARM ISA based), with instruction and data caches, connected through a Network-on-Chip (NoC). Our experiments confirm the feasibility of our approach.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"248 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114941647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Diagnosing Anomalous Network Performance with Confidence","authors":"B. Settlemyer, S. Hodson, J. Kuehn, S. Poole","doi":"10.1109/CCGrid.2011.80","DOIUrl":"https://doi.org/10.1109/CCGrid.2011.80","url":null,"abstract":"Variability in network performance is a major obstacle in effectively analyzing the throughput of modern high performance computer systems. High performance interconnection networks offer excellent best-case network latencies, however, highly parallel applications running on parallel machines typically require consistently high levels of performance to adequately leverage the massive amounts of available computing power. Performance analysts have usually quantified network performance using traditional summary statistics that assume the observational data is sampled from a normal distribution. In our examinations of network performance, we have found this method of analysis often provides too little data to understand anomalous network performance. In particular, we examine a multi-modal performance scenario encountered with an Infiniband interconnection network and we explore the performance repeatability on the custom Cray SeaStar2 interconnection network after a set of software and driver updates.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"368 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115175383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unifying Cloud Management: Towards Overall Governance of Business Level Objectives","authors":"M. Sedaghat, F. Hernández-Rodriguez, E. Elmroth","doi":"10.1109/CCGrid.2011.65","DOIUrl":"https://doi.org/10.1109/CCGrid.2011.65","url":null,"abstract":"We address the challenge of providing unified cloud resource management towards an overall business level objective, given the multitude of managerial tasks to be performed and the complexity of any architecture to support them. Resource level management tasks include elasticity control, virtual machine and data placement, autonomous fault management, etc, which are intrinsically difficult problems since services normally have unknown lifetime and capacity demands that varies largely over time. To unify the management of these problems, (for optimization with respect to some higher level business level objective, like optimizing revenue while breaking no more than a certain percentage of service level agreements)becomes even more challenging as the resource level managerial challenges are far from independent. After providing the general problem formulation, we review recent approaches taken by the research community, including mainly general autonomic computing technology for large-scale environments and resource level management tools equipped with some business oriented or otherwise qualitative features. We propose and illustrate a policy-driven approach where a high-level management system monitors overall system and services behavior and adjusts lower level policies (e.g., thresholds for admission control, elasticity control, server consolidation level, etc) for optimization towards the measurable business level objectives.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116058940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ex-MATE: Data Intensive Computing with Large Reduction Objects and Its Application to Graph Mining","authors":"Wei Jiang, G. Agrawal","doi":"10.1109/CCGrid.2011.18","DOIUrl":"https://doi.org/10.1109/CCGrid.2011.18","url":null,"abstract":"Map-reduce framework has been widely used as the infrastructure for processing large-scale datasets in various domains. Recent work has shown that an alternate API MATE(Mapreduce with an Alternate API), where a reduction object is explicitly maintained and updated, reduces memory requirements and can significantly improve performance for many applications. However, unlike the original API, support for the alternate API has been restricted to the cases where the reduction object can fit in the memory. This limits the applicability of the MATE approach. Particularly, one emerging class of applications that require support for large reduction objects are the graph mining applications. This paper describes a system, Extended MATE or Ex-MATE, which supports this alternate API with reduction objects of arbitrary sizes. We develop support for managing disk-resident reduction objects and updating them efficiently. We evaluate our system using three graph mining applications and compare its performance to that of PEGASUS, a graph mining system implemented based on the original map-reduce API and its Hadoop implementation. Our results on a cluster with 128 cores show that for all three applications, our system outperforms PEGASUS, by factors ranging between 9 and 35.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116593103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GeoServ: A Distributed Urban Sensing Platform","authors":"Jong Hoon Ahnn, Uichin Lee, H. J. Moon","doi":"10.1109/CCGrid.2011.10","DOIUrl":"https://doi.org/10.1109/CCGrid.2011.10","url":null,"abstract":"Urban sensing where mobile users continuously gather, process, and share location-sensitive sensor data (e.g., street images, road condition, traffic flow) is emerging as a new network paradigm of sensor information sharing in urban environments. The key enablers are the smart phones (e.g., iPhones and Android phones) equipped with onboard sensors (e.g., cameras, accelerometer, compass, GPS), and various wireless devices (e.g., WiFi and 2/3G). The goal of this paper is to design a scalable sensor networking platform where millions of users on the move can participate in urban sensing and share location-aware information using always-on cellular data connections. We propose a two-tier sensor networking platform called GeoServ where mobile users publish/access sensor data via an Internet-based distributed P2P overlay network. The main contribution of this paper is two-fold: a location-aware sensor data retrieval scheme that supports geographic range queries, and a location-aware publish-subscribe scheme that enables efficient multicast routing over a group of subscribed users. We prove that GeoServ protocols preserve locality and validate their performance via extensive simulations.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123505070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gerhard Klimeck, G. Adams, K. Madhavan, Nathan Denny, M. Zentner, Swaroop Shivarajapura, L. Zentner, D. Beaudoin
{"title":"Social Networks of Researchers and Educators on nanoHUB.org","authors":"Gerhard Klimeck, G. Adams, K. Madhavan, Nathan Denny, M. Zentner, Swaroop Shivarajapura, L. Zentner, D. Beaudoin","doi":"10.1109/CCGRID.2011.33","DOIUrl":"https://doi.org/10.1109/CCGRID.2011.33","url":null,"abstract":"The science gateway nanoHUB.org is the world's largest nanotechnology user facility, serving 167, 196 users in 2010 with over 2,300 resources including 189 simulation programs. Surveys of nanoHUB users and automated usage analysis find widespread simulation use in formal classroom education, thereby connecting recent research more rapidly and closely to education. Analysis of 719 citations in the scientific literature by over 1,300 authors to nanoHUB.org resources documents use of simulation programs by new research collaborations, by researchers outside of the community originating the program, and by experimentalists. The publication and author networks reveal research collaborations and capacity building through knowledge transfer. Analysis of secondary citations documents the quality of the conducted research with an h-index of 30 after just 10 years of operation. Our analysis proves with quantitative metrics that impactful research can be conducted by an ever growing research community. We argue that HUBzeroTM technology and the user-focused design and operation of nanoHUB.org are keys to success that can be transferred to other science gateways.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122725192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MPI-IO/Gfarm: An Optimized Implementation of MPI-IO for the Gfarm File System","authors":"Hiroki Kimura, O. Tatebe","doi":"10.1109/CCGrid.2011.82","DOIUrl":"https://doi.org/10.1109/CCGrid.2011.82","url":null,"abstract":"This paper proposes a design and implementation of an MPI-IO implementation of the Gfarm file system, called MPI-IO/Gfarm. The Gfarm file system is a global file system that federates the local storage of compute nodes among several clusters. It has a scale-out architecture designed to support distributed data-intensive computing. However Gfarm file system does not achieve scalable performance in the case of parallel writes to a single file, a typical file operation in MPI-IO. This paper proposes an optimization technique to improve the parallel write performance to a single file. In the evaluation, MPI-IO/Gfarm achieves scalable parallel I/O performance.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"226 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116849650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juan M. Tirado, Daniel Higuero, Florin Isaila, J. Carretero
{"title":"Predictive Data Grouping and Placement for Cloud-Based Elastic Server Infrastructures","authors":"Juan M. Tirado, Daniel Higuero, Florin Isaila, J. Carretero","doi":"10.1109/CCGrid.2011.49","DOIUrl":"https://doi.org/10.1109/CCGrid.2011.49","url":null,"abstract":"Workload variations on Internet platforms such as YouTube, Flickr, LastFM require novel approaches to dynamic resource provisioning in order to meet QoS requirements, while reducing the Total Cost of Ownership (TCO) of the infrastructures. The economy of scale promise of cloud computing is a great opportunity to approach this problem, by developing elastic large scale server infrastructures. However, a proactive approach to dynamic resource provisioning requires prediction models forecasting future load patterns. On the other hand, unexpected volume and data spikes require reactive provisioning for serving unexpected surges in workloads. When workload can not be predicted, adequate data grouping and placement algorithms may facilitate agile scaling up and down of an infrastructure. In this paper, we analyze a dynamic workload of an on-line music portal and present an elastic Web infrastructure that adapts to workload variations by dynamically scaling up and down servers. The workload is predicted by an autoregressive model capturing trends and seasonal patterns. Further, for enhancing data locality, we propose a predictive data grouping based on the history of content access of a user community. Finally, in order to facilitate agile elasticity, we present a data placement based on workload and access pattern prediction. The experimental results demonstrate that our forecasting model predicts workload with a high precision. Further, the predictive data grouping and placement methods provide high locality, load balance and high utilization of resources, allowing a server infrastructure to scale up and down depending on workload.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129651146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}