IEEE International Symposium on High-Performance Parallel Distributed Computing最新文献_第7页

Singleton: system-wide page deduplication in virtual environments 单例:虚拟环境中的全系统页面重复数据删除

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2012-06-18 DOI: 10.1145/2287076.2287081

Prateek Sharma, Purushottam Kulkarni

{"title":"Singleton: system-wide page deduplication in virtual environments","authors":"Prateek Sharma, Purushottam Kulkarni","doi":"10.1145/2287076.2287081","DOIUrl":"https://doi.org/10.1145/2287076.2287081","url":null,"abstract":"We investigate memory-management in hypervisors and propose Singleton, a KVM-based system-wide page deduplication solution to increase memory usage efficiency. We address the problem of double-caching that occurs in KVM---the same disk blocks are cached at both the host(hypervisor) and the guest(VM) page caches. Singleton's main components are identical-page sharing across guest virtual machines and an implementation of an exclusive-cache for the host and guest page cache hierarchy. We use and improve KSM--Kernel SamePage Merging to identify and share pages across guest virtual machines. We utilize guest memory-snapshots to scrub the host page cache and maintain a single copy of a page across the host and the guests. Singleton operates on a completely black-box assumption---we do not modify the guest or assume anything about its behaviour. We show that conventional operating system cache management techniques are sub-optimal for virtual environments, and how Singleton supplements and improves the existing Linux kernel memory-management mechanisms. Singleton is able to improve the utilization of the host cache by reducing its size(by upto an order of magnitude), and increasing the cache-hit ratio(by factor of 2x). This translates into better VM performance(40% faster I/O). Singleton's unified page deduplication and host cache scrubbing is able to reclaim large amounts of memory and facilitates higher levels of memory overcommitment. The optimizations to page deduplication we have implemented keep the overhead down to less than 20% CPU utilization.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132188744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 78

Performance evaluation of interthread communicationmechanisms on multicore/multithreaded architectures 多核/多线程架构下线程间通信机制的性能评估

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2012-06-18 DOI: 10.1145/2287076.2287098

D. Pasetto, Massimiliano Meneghin, H. Franke, F. Petrini, J. Xenidis

引用次数: 18

A resiliency model for high performance infrastructure based on logical encapsulation 基于逻辑封装的高性能基础设施的弹性模型

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2012-06-18 DOI: 10.1145/2287076.2287118

James J. Moore, C. Kesselman

{"title":"A resiliency model for high performance infrastructure based on logical encapsulation","authors":"James J. Moore, C. Kesselman","doi":"10.1145/2287076.2287118","DOIUrl":"https://doi.org/10.1145/2287076.2287118","url":null,"abstract":"An emerging trend in distributed systems is the creation of dynamically provisioned heterogeneous high performance platforms that include the co-allocation of both virtualized computing and network attached storage volumes offering NAS and SAN level data services. These high performance computing environments support parallel applications performing traditional file system operations. As with any parallel platform the ability to continue computation in the face of component failures is an important characteristic. Achieving resiliency in heterogeneous environments presents unique challenges and opportunities not found in homogeneous aggregations of computing resources. We present a logical encapsulation model for heterogeneous high performance infrastructure, which enables a reactive resiliency approach for federations of virtual machines and externally hosted physical storage volumes. Asynchronous state capture and restoration models are presented for individual resources, which are composed into non-blocking resiliency models for logical encapsulations. We perform an evaluation that demonstrates our methodology has greater overall flexibility and significant performance improvements when compared to current resiliency approaches in virtualized distributed execution environments.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116733354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A cost-intelligent application-specific data layout scheme for parallel file systems 用于并行文件系统的成本智能的特定于应用程序的数据布局方案

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2011-06-08 DOI: 10.1145/1996130.1996138

Huaiming Song, Yanlong Yin, Yong Chen, Xian-He Sun

{"title":"A cost-intelligent application-specific data layout scheme for parallel file systems","authors":"Huaiming Song, Yanlong Yin, Yong Chen, Xian-He Sun","doi":"10.1145/1996130.1996138","DOIUrl":"https://doi.org/10.1145/1996130.1996138","url":null,"abstract":"I/O data access is a recognized performance bottleneck of high-end computing. Several commercial and research parallel file systems have been developed in recent years to ease the performance bottleneck. These advanced file systems perform well on some applications but may not perform well on others. They have not reached their full potential in mitigating the I/O-wall problem. Data access is application dependent. Based on the application-specific optimization principle, in this study we propose a cost-intelligent data access strategy to improve the performance of parallel file systems. We first present a novel model to estimate data access cost of different data layout policies. Next, we extend the cost model to calculate the overall I/O cost of any given application and choose an appropriate layout policy for the application. A complex application may consist of different data access patterns. Averaging the data access patterns may not be the best solution for those complex applications that do not have a dominant pattern. We then further propose a hybrid data replication strategy for those applications, so that a file can have replications with different layout policies for the best performance. Theoretical analysis and experimental testing have been conducted to verify the newly proposed cost-intelligent layout approach. Analytical and experimental results show that the proposed cost model is effective and the application-specific data layout approach achieved up to 74% performance improvement for data-intensive applications.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121208074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 47

Adapting MapReduce for HPC environments 为HPC环境调整MapReduce

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2011-06-08 DOI: 10.1145/1996130.1996166

Zacharia Fadika, Elif Dede, M. Govindaraju, L. Ramakrishnan

引用次数: 7

Design space exploration for aggressive core replication schemes in CMPs cmp中主动核心复制方案的设计空间探索

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2011-06-08 DOI: 10.1145/1996130.1996169

Lluc Alvarez, Ramon Bertran Monfort, Marc González, X. Martorell, N. Navarro, E. Ayguadé

引用次数: 0

Incremental placement of interactive perception applications 交互式感知应用程序的增量放置

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2011-06-08 DOI: 10.1145/1996130.1996149

N. Yigitbasi, L. Mummert, P. Pillai, D. Epema

{"title":"Incremental placement of interactive perception applications","authors":"N. Yigitbasi, L. Mummert, P. Pillai, D. Epema","doi":"10.1145/1996130.1996149","DOIUrl":"https://doi.org/10.1145/1996130.1996149","url":null,"abstract":"Interactive perception applications, such as gesture recognition and vision-based user interfaces, process high-data rate streams with compute intensive computer vision and machine learning algorithms. These applications can be represented as data flow graphs comprising several processing stages. Such applications require low latency to be interactive so that the results are immediately available to the user. To achieve low latency, we exploit the inherent coarse grained task and data parallelism of these applications by running them on clusters of machines. This paper addresses an important problem that arises: how to place the stages of these applications on machines to minimize the latency, and in particular, how to adjust an existing schedule in response to changes in the operating conditions (perturbations) while minimizing the disruption in the existing placement (churn). To this end, we propose four incremental placement heuristics which use the HEFT scheduling algorithm as their primary building block. Through simulations and experiments on a real implementation, using diverse workloads and a range of perturbation scenarios, we demonstrate that dynamic adjustment of the schedule can improve latency by as much as 36%, while producing little churn.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123815309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

ClusterSs: a task-based programming model for clusters clusters:用于集群的基于任务的编程模型

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2011-06-08 DOI: 10.1145/1996130.1996168

E. Tejedor, Montse Farreras, D. Grove, Rosa M. Badia, G. Almási, Jesús Labarta

引用次数: 29

Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework 通过透明的运行时整合框架支持云环境中的GPU共享

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2011-06-08 DOI: 10.1145/1996130.1996160

Vignesh T. Ravi, M. Becchi, G. Agrawal, S. Chakradhar

{"title":"Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework","authors":"Vignesh T. Ravi, M. Becchi, G. Agrawal, S. Chakradhar","doi":"10.1145/1996130.1996160","DOIUrl":"https://doi.org/10.1145/1996130.1996160","url":null,"abstract":"Driven by the emergence of GPUs as a major player in high performance computing and the rapidly growing popularity of cloud environments, GPU instances are now being offered by cloud providers. The use of GPUs in a cloud environment, however, is still at initial stages, and the challenge of making GPU a true shared resource in the cloud has not yet been addressed.\u0000 This paper presents a framework to enable applications executing within virtual machines to transparently share one or more GPUs. Our contributions are twofold: we extend an open source GPU virtualization software to include efficient GPU sharing, and we propose solutions to the conceptual problem of GPU kernel consolidation. In particular, we introduce a method for computing the affinity score between two or more kernels, which provides an indication of potential performance improvements upon kernel consolidation. In addition, we explore molding as a means to achieve efficient GPU sharing also in the case of kernels with high or conflicting resource requirements. We use these concepts to develop an algorithm to efficiently map a set of kernels on a pair of GPUs. We extensively evaluate our framework using eight popular GPU kernels and two Fermi GPUs. We find that even when contention is high our consolidation algorithm is effective in improving the throughput, and that the runtime overhead of our framework is low.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132553521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 127

Going back and forth: efficient multideployment and multisnapshotting on clouds 来回:在云上高效的多部署和多快照

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2011-06-08 DOI: 10.1145/1996130.1996152

Bogdan Nicolae, J. Bresnahan, K. Keahey, Gabriel Antoniu

{"title":"Going back and forth: efficient multideployment and multisnapshotting on clouds","authors":"Bogdan Nicolae, J. Bresnahan, K. Keahey, Gabriel Antoniu","doi":"10.1145/1996130.1996152","DOIUrl":"https://doi.org/10.1145/1996130.1996152","url":null,"abstract":"Infrastructure as a Service (IaaS) cloud computing has revolutionized the way we think of acquiring resources by introducing a simple change: allowing users to lease computational resources from the cloud provider's datacenter for a short time by deploying virtual machines (VMs) on these resources. This new model raises new challenges in the design and development of IaaS middleware. One of those challenges is the need to deploy a large number (hundreds or even thousands) of VM instances simultaneously. Once the VM instances are deployed, another challenge is to simultaneously take a snapshot of many images and transfer them to persistent storage to support management tasks, such as suspend-resume and migration. With datacenters growing rapidly and configurations becoming heterogeneous, it is important to enable efficient concurrent deployment and snapshotting that are at the same time hypervisor independent and ensure a maximum compatibility with different configurations. This paper addresses these challenges by proposing a virtual file system specifically optimized for virtual machine image storage. It is based on a lazy transfer scheme coupled with object versioning that handles snapshotting transparently in a hypervisor-independent fashion, ensuring high portability for different configurations. Large-scale experiments on hundreds of nodes demonstrate excellent performance results: speedup for concurrent VM deployments ranges from a factor of 2 up to 25, with a reduction in bandwidth utilization of as much as 90%.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133141993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 68