2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing最新文献_第7页

Small Discrete Fourier Transforms on GPUs gpu上的小离散傅里叶变换

2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing Pub Date : 2011-05-23 DOI: 10.1109/CCGrid.2011.14

S. Mitra, A. Srinivasan

{"title":"Small Discrete Fourier Transforms on GPUs","authors":"S. Mitra, A. Srinivasan","doi":"10.1109/CCGrid.2011.14","DOIUrl":"https://doi.org/10.1109/CCGrid.2011.14","url":null,"abstract":"Efficient implementations of the Discrete Fourier Transform (DFT) for GPUs provide good performance with large data sizes, but are not competitive with CPU code for small data sizes. On the other hand, several applications perform multiple DFTs on small data sizes. In fact, even algorithms for large data sizes use a divide-and-conquer approach, where eventually small DFTs need to be performed. We discuss our DFT implementation, which is efficient for multiple small DFTs. One feature of our implementation is the use of the asymptotically slow matrix multiplication approach for small data sizes, which improves performance on the GPU due to its regular memory access and computational patterns. We combine this algorithm with the mixed radix algorithm for 1-D, 2-D, and 3-D complex DFTs. We also demonstrate the effect of different optimization techniques. When GPUs are used to accelerate a component of an application running on the host, it is important that decisions taken to optimize the GPU performance not affect the performance of the rest of the application on the host. One feature of our implementation is that we use a data layout that is not optimal for the GPU so that the overall effect on the application is better. Our implementation performs up to two orders of magnitude faster than cuFFT on an NVIDIA GeForce 9800 GTX GPU and up to one to two orders of magnitude faster than FFTW on a CPU for multiple small DFTs. Furthermore, we show that our implementation can accelerate the performance of a Quantum Monte Carlo application for which cuFFT is not effective. The primary contributions of this work lie in demonstrating the utility of the matrix multiplication approach and also in providing an implementation that is efficient for small DFTs when a GPU is used to accelerate an application running on the host.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"14 8 Pt 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123722014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

EZTrace: A Generic Framework for Performance Analysis EZTrace:性能分析的通用框架

2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing Pub Date : 2011-05-23 DOI: 10.1109/CCGrid.2011.83

François Trahay, François Rué, Mathieu Faverge, Y. Ishikawa, R. Namyst, J. Dongarra

引用次数: 48

Engineering Incentives in Social Clouds 社会云中的工程激励

2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing Pub Date : 2011-05-23 DOI: 10.1109/CCGrid.2011.52

Christian Haas, Simon Caton, Christof Weinhardt

引用次数: 17

Building an Online Domain-Specific Computing Service over Non-dedicated Grid and Cloud Resources: The Superlink-Online Experience 在非专用网格和云资源上构建在线特定领域计算服务:超级链接在线体验

2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing Pub Date : 2011-05-23 DOI: 10.1109/CCGrid.2011.46

M. Silberstein

{"title":"Building an Online Domain-Specific Computing Service over Non-dedicated Grid and Cloud Resources: The Superlink-Online Experience","authors":"M. Silberstein","doi":"10.1109/CCGrid.2011.46","DOIUrl":"https://doi.org/10.1109/CCGrid.2011.46","url":null,"abstract":"Linkage analysis is a statistical method used by geneticists in everyday practice for mapping disease-susceptibility genes in the study of complex diseases. An essential first step in the study of genetic diseases, linkage computations may require years of CPU time. The recent DNA sampling revolution enabled unprecedented sampling density, but made the analysis even more computationally demanding. In this paper we describe a high performance online service for genetic linkage analysis, called Super link-online. The system enables anyone with Internet access to submit genetic data and analyze it as easily and quickly as if using a supercomputer. The analyses are automatically parallelized and executed on tens of thousands distributed CPUs in multiple clouds and grids. The first version of the system, which employed up to 3,000 CPUs in UW Madison and Technion Condor pools, has been successfully used since 2006 by hundreds of geneticists worldwide, with over 40 citations in the genetics literature. Here we describe the second version, which substantially improves the scalability and performance of first: it uses over 45,000 non-dedicated hosts, in 10different grids and clouds, including EC2 and the Superlink@Technion community grid. Improved system performance is obtained through a virtual grid hierarchy with dynamic load balancing and multi-grid overlay via the Grid Bot system, parallel pruning of short tasks for overhead minimization, and cost-efficient use of cloud resources in reliability-critical execution periods. These enhancements enabled execution of many previously infeasible analyses, which can now be completed within a few hours. The new version of the system, in production since 2009, has completed over 6500 different runs of over 10 million tasks, with total consumption of 420 CPU years.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127011544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

PAC-PLRU: A Cache Replacement Policy to Salvage Discarded Predictions from Hardware Prefetchers PAC-PLRU:从硬件预取器中回收丢弃预测的缓存替换策略

2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing Pub Date : 2011-05-23 DOI: 10.1109/CCGrid.2011.27

Ke Zhang, Zhensong Wang, Yong Chen, Huaiyu Zhu, Xian-He Sun

{"title":"PAC-PLRU: A Cache Replacement Policy to Salvage Discarded Predictions from Hardware Prefetchers","authors":"Ke Zhang, Zhensong Wang, Yong Chen, Huaiyu Zhu, Xian-He Sun","doi":"10.1109/CCGrid.2011.27","DOIUrl":"https://doi.org/10.1109/CCGrid.2011.27","url":null,"abstract":"Cache replacement policy plays an important role in guaranteeing the availability of cache blocks, reducing miss rates, and improving applications' overall performance. However, recent research efforts on improving replacement policies require either significant additional hardware or major modifications to the organization of the existing cache. In this study, we propose the PAC-PLRU cache replacement policy. PAC-PLRU not only utilizes but also judiciously salvages the prediction information discarded from a widely-adopted stride prefetcher. The main idea behind PAC-PLRU is utilizing the prediction results generated by the existing stride prefetcher and preventing these predicted cache blocks from being replaced in the near future. Experimental results show that leveraging the PAC-PLRU with a stride prefetcher reduces the average L2 cache miss rate by 91% over a baseline system with only PLRU policy, and by 22% over a system using PLRU with an unconnected stride prefetcher. Most importantly, PAC-PLRU only requires minor modifications to existing cache architecture to get these benefits. The proposed PAC-PLRU policy is promising in fostering the connection between prefetching and replacement policies, and have a lasting impact on improving the overall cache performance.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127182786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

DHTbd: A Reliable Block-Based Storage System for High Performance Clusters DHTbd:面向高性能集群的可靠块存储系统

2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing Pub Date : 2011-05-23 DOI: 10.1109/CCGrid.2011.59

G. Parisis, G. Xylomenos, T. Apostolopoulos

引用次数: 8

On the Performance Variability of Production Cloud Services 关于生产云服务的性能可变性

2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing Pub Date : 2011-05-23 DOI: 10.1109/CCGRID.2011.22

A. Iosup, N. Yigitbasi, D. Epema

{"title":"On the Performance Variability of Production Cloud Services","authors":"A. Iosup, N. Yigitbasi, D. Epema","doi":"10.1109/CCGRID.2011.22","DOIUrl":"https://doi.org/10.1109/CCGRID.2011.22","url":null,"abstract":"Cloud computing is an emerging infrastructure paradigm that promises to eliminate the need for companies to maintain expensive computing hardware. Through the use of virtualization and resource time-sharing, clouds address with a single set of physical resources a large user base with diverse needs. Thus, clouds have the potential to provide their owners the benefits of an economy of scale and, at the same time, become an alternative for both the industry and the scientific community to self-owned clusters, grids, and parallel production environments. For this potential to become reality, the first generation of commercial clouds need to be proven to be dependable. In this work we analyze the dependability of cloud services. Towards this end, we analyze long-term performance traces from Amazon Web Services and Google App Engine, currently two of the largest commercial clouds in production. We find that the performance of about half of the cloud services we investigate exhibits yearly and daily patterns, but also that most services have periods of especially stable performance. Last, through trace-based simulation we assess the impact of the variability observed for the studied cloud services on three large-scale applications, job execution in scientific computing, virtual goods trading in social networks, and state management in social gaming. We show that the impact of performance variability depends on the application, and give evidence that performance variability can be an important factor in cloud provider selection.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129834090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 365

A Flexible Policy Framework for the QoS Differentiated Provisioning of Services 灵活的QoS差异化业务发放策略框架

2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing Pub Date : 2011-05-01 DOI: 10.1109/CCGrid.2011.36

Mohan Baruwal Chhetri, Quoc Bao Vo, R. Kowalczyk

引用次数: 7

Optimized Management of Power and Performance for Virtualized Heterogeneous Server Clusters 虚拟化异构服务器集群的电源和性能优化管理

2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing Pub Date : 2011-05-01 DOI: 10.1109/CCGrid.2011.15

V. Petrucci, E. V. Carrera, O. Loques, J. Leite, D. Mossé

{"title":"Optimized Management of Power and Performance for Virtualized Heterogeneous Server Clusters","authors":"V. Petrucci, E. V. Carrera, O. Loques, J. Leite, D. Mossé","doi":"10.1109/CCGrid.2011.15","DOIUrl":"https://doi.org/10.1109/CCGrid.2011.15","url":null,"abstract":"This paper proposes and evaluates an approach for power and performance management in virtualized server clusters. The major goal of our approach is to reduce power consumption in the cluster while meeting performance requirements. The contributions of this paper are: (1) a simple but effective way of modeling power consumption and capacity of servers even under heterogeneous and changing workloads, and (2) an optimization strategy based on a mixed integer programming model for achieving improvements on power-efficiency while providing performance guarantees in the virtualized cluster. In the optimization model, we address application workload balancing and the often ignored switching costs due to frequent and undesirable turning servers on/off and VM relocations. We show the effectiveness of the approach applied to a server cluster test bed. Our experiments show that our approach conserves about 50% of the energy required by a system designed for peak workload scenario, with little impact on the applications' performance goals. Also, by using prediction in our optimization strategy, further QoS improvement was achieved.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127106763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 93

Efficient Support for MPI-I/O Atomicity Based on Versioning 基于版本控制的MPI-I/O原子性的有效支持

2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing Pub Date : 2010-12-15 DOI: 10.1109/CCGrid.2011.60

Viet-Trung Tran, Bogdan Nicolae, Gabriel Antoniu, L. Bougé

引用次数: 6