{"title":"Small Discrete Fourier Transforms on GPUs","authors":"S. Mitra, A. Srinivasan","doi":"10.1109/CCGrid.2011.14","DOIUrl":"https://doi.org/10.1109/CCGrid.2011.14","url":null,"abstract":"Efficient implementations of the Discrete Fourier Transform (DFT) for GPUs provide good performance with large data sizes, but are not competitive with CPU code for small data sizes. On the other hand, several applications perform multiple DFTs on small data sizes. In fact, even algorithms for large data sizes use a divide-and-conquer approach, where eventually small DFTs need to be performed. We discuss our DFT implementation, which is efficient for multiple small DFTs. One feature of our implementation is the use of the asymptotically slow matrix multiplication approach for small data sizes, which improves performance on the GPU due to its regular memory access and computational patterns. We combine this algorithm with the mixed radix algorithm for 1-D, 2-D, and 3-D complex DFTs. We also demonstrate the effect of different optimization techniques. When GPUs are used to accelerate a component of an application running on the host, it is important that decisions taken to optimize the GPU performance not affect the performance of the rest of the application on the host. One feature of our implementation is that we use a data layout that is not optimal for the GPU so that the overall effect on the application is better. Our implementation performs up to two orders of magnitude faster than cuFFT on an NVIDIA GeForce 9800 GTX GPU and up to one to two orders of magnitude faster than FFTW on a CPU for multiple small DFTs. Furthermore, we show that our implementation can accelerate the performance of a Quantum Monte Carlo application for which cuFFT is not effective. The primary contributions of this work lie in demonstrating the utility of the matrix multiplication approach and also in providing an implementation that is efficient for small DFTs when a GPU is used to accelerate an application running on the host.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"14 8 Pt 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123722014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
François Trahay, François Rué, Mathieu Faverge, Y. Ishikawa, R. Namyst, J. Dongarra
{"title":"EZTrace: A Generic Framework for Performance Analysis","authors":"François Trahay, François Rué, Mathieu Faverge, Y. Ishikawa, R. Namyst, J. Dongarra","doi":"10.1109/CCGrid.2011.83","DOIUrl":"https://doi.org/10.1109/CCGrid.2011.83","url":null,"abstract":"Modern supercomputers with multi-core nodes enhanced by accelerators, as well as hybrid programming models introduce more complexity in modern applications. Exploiting efficiently all the resources requires a complex analysis of the performance of applications in order to detect time-consuming sections. We present eztrace, a generic trace generation framework that aims at providing a simple way to analyze applications. eztrace is based on plugins that allow it to trace different programming models such as MPI, pthread or OpenMP as well as user-defined libraries or applications. eztrace uses two steps: one to collect the basic information during execution and one post-mortem analysis. This permits tracing the execution of applications with low overhead while allowing to refine the analysis after the execution. We also present a script language for eztrace that gives the user the opportunity to easily define the functions to instrument without modifying the source code of the application.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125285354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Engineering Incentives in Social Clouds","authors":"Christian Haas, Simon Caton, Christof Weinhardt","doi":"10.1109/CCGrid.2011.52","DOIUrl":"https://doi.org/10.1109/CCGrid.2011.52","url":null,"abstract":"Combining the strengths of the Cloud Computing and Social Network paradigms, the vision of Social Clouds aims to provide a resource sharing mechanism where participants dynamically share and trade resources on the premise of the relationships encoded in a social network. By building upon existing relationships in social networks and the inherent trust that accompanies these relationships, a Social Cloud is able to address one of the most cited obstacles in the adoption of current cloud solutions, the missing trust between Cloud service providers and users. However, as with other computing approaches relying on user participation, incentivisation of potential and existing participants is crucial for the success and sustainability of a Social Cloud. Therefore, an incentive engineering approach is needed and will be discussed in this paper considering all phases of user participation in a Social Cloud in order to provide proper incentives for active user participation and desired user behavior.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126702173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Building an Online Domain-Specific Computing Service over Non-dedicated Grid and Cloud Resources: The Superlink-Online Experience","authors":"M. Silberstein","doi":"10.1109/CCGrid.2011.46","DOIUrl":"https://doi.org/10.1109/CCGrid.2011.46","url":null,"abstract":"Linkage analysis is a statistical method used by geneticists in everyday practice for mapping disease-susceptibility genes in the study of complex diseases. An essential first step in the study of genetic diseases, linkage computations may require years of CPU time. The recent DNA sampling revolution enabled unprecedented sampling density, but made the analysis even more computationally demanding. In this paper we describe a high performance online service for genetic linkage analysis, called Super link-online. The system enables anyone with Internet access to submit genetic data and analyze it as easily and quickly as if using a supercomputer. The analyses are automatically parallelized and executed on tens of thousands distributed CPUs in multiple clouds and grids. The first version of the system, which employed up to 3,000 CPUs in UW Madison and Technion Condor pools, has been successfully used since 2006 by hundreds of geneticists worldwide, with over 40 citations in the genetics literature. Here we describe the second version, which substantially improves the scalability and performance of first: it uses over 45,000 non-dedicated hosts, in 10different grids and clouds, including EC2 and the Superlink@Technion community grid. Improved system performance is obtained through a virtual grid hierarchy with dynamic load balancing and multi-grid overlay via the Grid Bot system, parallel pruning of short tasks for overhead minimization, and cost-efficient use of cloud resources in reliability-critical execution periods. These enhancements enabled execution of many previously infeasible analyses, which can now be completed within a few hours. The new version of the system, in production since 2009, has completed over 6500 different runs of over 10 million tasks, with total consumption of 420 CPU years.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127011544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ke Zhang, Zhensong Wang, Yong Chen, Huaiyu Zhu, Xian-He Sun
{"title":"PAC-PLRU: A Cache Replacement Policy to Salvage Discarded Predictions from Hardware Prefetchers","authors":"Ke Zhang, Zhensong Wang, Yong Chen, Huaiyu Zhu, Xian-He Sun","doi":"10.1109/CCGrid.2011.27","DOIUrl":"https://doi.org/10.1109/CCGrid.2011.27","url":null,"abstract":"Cache replacement policy plays an important role in guaranteeing the availability of cache blocks, reducing miss rates, and improving applications' overall performance. However, recent research efforts on improving replacement policies require either significant additional hardware or major modifications to the organization of the existing cache. In this study, we propose the PAC-PLRU cache replacement policy. PAC-PLRU not only utilizes but also judiciously salvages the prediction information discarded from a widely-adopted stride prefetcher. The main idea behind PAC-PLRU is utilizing the prediction results generated by the existing stride prefetcher and preventing these predicted cache blocks from being replaced in the near future. Experimental results show that leveraging the PAC-PLRU with a stride prefetcher reduces the average L2 cache miss rate by 91% over a baseline system with only PLRU policy, and by 22% over a system using PLRU with an unconnected stride prefetcher. Most importantly, PAC-PLRU only requires minor modifications to existing cache architecture to get these benefits. The proposed PAC-PLRU policy is promising in fostering the connection between prefetching and replacement policies, and have a lasting impact on improving the overall cache performance.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127182786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DHTbd: A Reliable Block-Based Storage System for High Performance Clusters","authors":"G. Parisis, G. Xylomenos, T. Apostolopoulos","doi":"10.1109/CCGrid.2011.59","DOIUrl":"https://doi.org/10.1109/CCGrid.2011.59","url":null,"abstract":"Large, reliable and efficient storage systems are becoming increasingly important in enterprise environments. Our research in storage system design is oriented towards the exploitation of commodity hardware for building a high performance, resilient and scalable storage system. We present the design and implementation of DHTbd, a general purpose decentralized storage system where storage nodes support a distributed hash table based interface and clients are implemented as in-kernel device drivers. DHTbd, unlike most storage systems proposed to date, is implemented at the block device level of the I/O stack, a simple yet efficient design. The experimental evaluation of the proposed system demonstrates its very good I/O performance, its ability to scale to large clusters, as well as its robustness, even when massive failures occur.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116996108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Performance Variability of Production Cloud Services","authors":"A. Iosup, N. Yigitbasi, D. Epema","doi":"10.1109/CCGRID.2011.22","DOIUrl":"https://doi.org/10.1109/CCGRID.2011.22","url":null,"abstract":"Cloud computing is an emerging infrastructure paradigm that promises to eliminate the need for companies to maintain expensive computing hardware. Through the use of virtualization and resource time-sharing, clouds address with a single set of physical resources a large user base with diverse needs. Thus, clouds have the potential to provide their owners the benefits of an economy of scale and, at the same time, become an alternative for both the industry and the scientific community to self-owned clusters, grids, and parallel production environments. For this potential to become reality, the first generation of commercial clouds need to be proven to be dependable. In this work we analyze the dependability of cloud services. Towards this end, we analyze long-term performance traces from Amazon Web Services and Google App Engine, currently two of the largest commercial clouds in production. We find that the performance of about half of the cloud services we investigate exhibits yearly and daily patterns, but also that most services have periods of especially stable performance. Last, through trace-based simulation we assess the impact of the variability observed for the studied cloud services on three large-scale applications, job execution in scientific computing, virtual goods trading in social networks, and state management in social gaming. We show that the impact of performance variability depends on the application, and give evidence that performance variability can be an important factor in cloud provider selection.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129834090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Flexible Policy Framework for the QoS Differentiated Provisioning of Services","authors":"Mohan Baruwal Chhetri, Quoc Bao Vo, R. Kowalczyk","doi":"10.1109/CCGrid.2011.36","DOIUrl":"https://doi.org/10.1109/CCGrid.2011.36","url":null,"abstract":"We propose a policy-based framework for the QoS differentiated provisioning of services. The proposed frame-work improves the state-of-the-art in policy-based preferencespecification by combining cardinal and ordinal preferences. We describe the underlying models, focussing on the key features and contributions of the proposed framework. We also show how, using our framework, the QoS evaluation problem can be translated to a Constraint Satisfaction Problem while preserving the semantics of the preference policies.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121263098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Petrucci, E. V. Carrera, O. Loques, J. Leite, D. Mossé
{"title":"Optimized Management of Power and Performance for Virtualized Heterogeneous Server Clusters","authors":"V. Petrucci, E. V. Carrera, O. Loques, J. Leite, D. Mossé","doi":"10.1109/CCGrid.2011.15","DOIUrl":"https://doi.org/10.1109/CCGrid.2011.15","url":null,"abstract":"This paper proposes and evaluates an approach for power and performance management in virtualized server clusters. The major goal of our approach is to reduce power consumption in the cluster while meeting performance requirements. The contributions of this paper are: (1) a simple but effective way of modeling power consumption and capacity of servers even under heterogeneous and changing workloads, and (2) an optimization strategy based on a mixed integer programming model for achieving improvements on power-efficiency while providing performance guarantees in the virtualized cluster. In the optimization model, we address application workload balancing and the often ignored switching costs due to frequent and undesirable turning servers on/off and VM relocations. We show the effectiveness of the approach applied to a server cluster test bed. Our experiments show that our approach conserves about 50% of the energy required by a system designed for peak workload scenario, with little impact on the applications' performance goals. Also, by using prediction in our optimization strategy, further QoS improvement was achieved.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127106763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Viet-Trung Tran, Bogdan Nicolae, Gabriel Antoniu, L. Bougé
{"title":"Efficient Support for MPI-I/O Atomicity Based on Versioning","authors":"Viet-Trung Tran, Bogdan Nicolae, Gabriel Antoniu, L. Bougé","doi":"10.1109/CCGrid.2011.60","DOIUrl":"https://doi.org/10.1109/CCGrid.2011.60","url":null,"abstract":"We consider the challenge of building data management systems that meet an important requirement of today's data-intensive HPC applications: to provide a high I/O throughput while supporting highly concurrent data accesses. In this context, many applications rely on MPI-I/O and require atomic, non-contiguous I/O operations that concurrently access shared data. In most existing implementations, the atomicity requirement is implemented through locking-based schemes, which have proven inefficient, especially for non-contiguous I/O. We claim that using a versioning-enabled storage back-end has the potential to avoid the expensive synchronization induced by locking-based schemes. We describe a prototype implementation on top of ROMIO, and report on promising experimental results with standard MPI-I/O benchmarks specifically designed to evaluate the performance of non-contiguous, overlapped I/O accesses under MPI atomicity guarantees.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114745511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}