IEEE International Symposium on High-Performance Parallel Distributed Computing最新文献

筛选
英文 中文
Singleton: system-wide page deduplication in virtual environments 单例:虚拟环境中的全系统页面重复数据删除
IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2012-06-18 DOI: 10.1145/2287076.2287081
Prateek Sharma, Purushottam Kulkarni
{"title":"Singleton: system-wide page deduplication in virtual environments","authors":"Prateek Sharma, Purushottam Kulkarni","doi":"10.1145/2287076.2287081","DOIUrl":"https://doi.org/10.1145/2287076.2287081","url":null,"abstract":"We investigate memory-management in hypervisors and propose Singleton, a KVM-based system-wide page deduplication solution to increase memory usage efficiency. We address the problem of double-caching that occurs in KVM---the same disk blocks are cached at both the host(hypervisor) and the guest(VM) page caches. Singleton's main components are identical-page sharing across guest virtual machines and an implementation of an exclusive-cache for the host and guest page cache hierarchy. We use and improve KSM--Kernel SamePage Merging to identify and share pages across guest virtual machines. We utilize guest memory-snapshots to scrub the host page cache and maintain a single copy of a page across the host and the guests. Singleton operates on a completely black-box assumption---we do not modify the guest or assume anything about its behaviour. We show that conventional operating system cache management techniques are sub-optimal for virtual environments, and how Singleton supplements and improves the existing Linux kernel memory-management mechanisms. Singleton is able to improve the utilization of the host cache by reducing its size(by upto an order of magnitude), and increasing the cache-hit ratio(by factor of 2x). This translates into better VM performance(40% faster I/O). Singleton's unified page deduplication and host cache scrubbing is able to reclaim large amounts of memory and facilitates higher levels of memory overcommitment. The optimizations to page deduplication we have implemented keep the overhead down to less than 20% CPU utilization.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132188744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 78
Performance evaluation of interthread communicationmechanisms on multicore/multithreaded architectures 多核/多线程架构下线程间通信机制的性能评估
IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2012-06-18 DOI: 10.1145/2287076.2287098
D. Pasetto, Massimiliano Meneghin, H. Franke, F. Petrini, J. Xenidis
{"title":"Performance evaluation of interthread communicationmechanisms on multicore/multithreaded architectures","authors":"D. Pasetto, Massimiliano Meneghin, H. Franke, F. Petrini, J. Xenidis","doi":"10.1145/2287076.2287098","DOIUrl":"https://doi.org/10.1145/2287076.2287098","url":null,"abstract":"The three major solutions for increasing the nominal performance of a CPU are: multiplying the number of cores per socket, expanding the embedded cache memories and use multi-threading to reduce the impact of the deep memory hierarchy. Systems with tens or hundreds of hardware threads, all sharing a cache coherent UMA or NUMA memory space, are today the de-facto standard. While these solutions can easily provide benefits in a multi-program environment, they require recoding of applications to leverage the available parallelism. Threads must synchronize and exchange data, and the overall performance is heavily in influenced by the overhead added by these mechanisms, especially as developers try to exploit finer grain parallelism to be able to use all available resources.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125487324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
A resiliency model for high performance infrastructure based on logical encapsulation 基于逻辑封装的高性能基础设施的弹性模型
IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2012-06-18 DOI: 10.1145/2287076.2287118
James J. Moore, C. Kesselman
{"title":"A resiliency model for high performance infrastructure based on logical encapsulation","authors":"James J. Moore, C. Kesselman","doi":"10.1145/2287076.2287118","DOIUrl":"https://doi.org/10.1145/2287076.2287118","url":null,"abstract":"An emerging trend in distributed systems is the creation of dynamically provisioned heterogeneous high performance platforms that include the co-allocation of both virtualized computing and network attached storage volumes offering NAS and SAN level data services. These high performance computing environments support parallel applications performing traditional file system operations. As with any parallel platform the ability to continue computation in the face of component failures is an important characteristic. Achieving resiliency in heterogeneous environments presents unique challenges and opportunities not found in homogeneous aggregations of computing resources. We present a logical encapsulation model for heterogeneous high performance infrastructure, which enables a reactive resiliency approach for federations of virtual machines and externally hosted physical storage volumes. Asynchronous state capture and restoration models are presented for individual resources, which are composed into non-blocking resiliency models for logical encapsulations. We perform an evaluation that demonstrates our methodology has greater overall flexibility and significant performance improvements when compared to current resiliency approaches in virtualized distributed execution environments.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116733354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A cost-intelligent application-specific data layout scheme for parallel file systems 用于并行文件系统的成本智能的特定于应用程序的数据布局方案
IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2011-06-08 DOI: 10.1145/1996130.1996138
Huaiming Song, Yanlong Yin, Yong Chen, Xian-He Sun
{"title":"A cost-intelligent application-specific data layout scheme for parallel file systems","authors":"Huaiming Song, Yanlong Yin, Yong Chen, Xian-He Sun","doi":"10.1145/1996130.1996138","DOIUrl":"https://doi.org/10.1145/1996130.1996138","url":null,"abstract":"I/O data access is a recognized performance bottleneck of high-end computing. Several commercial and research parallel file systems have been developed in recent years to ease the performance bottleneck. These advanced file systems perform well on some applications but may not perform well on others. They have not reached their full potential in mitigating the I/O-wall problem. Data access is application dependent. Based on the application-specific optimization principle, in this study we propose a cost-intelligent data access strategy to improve the performance of parallel file systems. We first present a novel model to estimate data access cost of different data layout policies. Next, we extend the cost model to calculate the overall I/O cost of any given application and choose an appropriate layout policy for the application. A complex application may consist of different data access patterns. Averaging the data access patterns may not be the best solution for those complex applications that do not have a dominant pattern. We then further propose a hybrid data replication strategy for those applications, so that a file can have replications with different layout policies for the best performance. Theoretical analysis and experimental testing have been conducted to verify the newly proposed cost-intelligent layout approach. Analytical and experimental results show that the proposed cost model is effective and the application-specific data layout approach achieved up to 74% performance improvement for data-intensive applications.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121208074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Adapting MapReduce for HPC environments 为HPC环境调整MapReduce
IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2011-06-08 DOI: 10.1145/1996130.1996166
Zacharia Fadika, Elif Dede, M. Govindaraju, L. Ramakrishnan
{"title":"Adapting MapReduce for HPC environments","authors":"Zacharia Fadika, Elif Dede, M. Govindaraju, L. Ramakrishnan","doi":"10.1145/1996130.1996166","DOIUrl":"https://doi.org/10.1145/1996130.1996166","url":null,"abstract":"MapReduce is increasingly gaining popularity as a programming model for use in large-scale distributed processing. The model is most widely used when implemented using the Hadoop Distributed File System (HDFS). The use of the HDFS, however, precludes the direct applicability of the model to HPC environments, which use high performance distributed file systems. In such distributed environments, the MapReduce model can rarely make use of full resources, as local disks may not be available for data placement on all the nodes. This work proposes a MapReduce implementation and design choices directly suitable for such HPC environments.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122132646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Design space exploration for aggressive core replication schemes in CMPs cmp中主动核心复制方案的设计空间探索
IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2011-06-08 DOI: 10.1145/1996130.1996169
Lluc Alvarez, Ramon Bertran Monfort, Marc González, X. Martorell, N. Navarro, E. Ayguadé
{"title":"Design space exploration for aggressive core replication schemes in CMPs","authors":"Lluc Alvarez, Ramon Bertran Monfort, Marc González, X. Martorell, N. Navarro, E. Ayguadé","doi":"10.1145/1996130.1996169","DOIUrl":"https://doi.org/10.1145/1996130.1996169","url":null,"abstract":"Chip multiprocessors (CMPs) are the dominating architectures nowadays. There is a big variety of designs in current CMPs, with different number of cores and memory subsystems. This is because they are used in a wide spectrum of domains, each of them with their own design goals. This pa per studies different chip configurations in terms of number of cores, size of the shared L3 cache and off-chip bandwidth requirements in order to find what is the most efficient design for High Performance Computing applications. Results show that CMP schemes that reduce the shared L3 cache in order to make room for additional cores achieve speedups of up to 3.31x against a baseline architecture.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128290528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incremental placement of interactive perception applications 交互式感知应用程序的增量放置
IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2011-06-08 DOI: 10.1145/1996130.1996149
N. Yigitbasi, L. Mummert, P. Pillai, D. Epema
{"title":"Incremental placement of interactive perception applications","authors":"N. Yigitbasi, L. Mummert, P. Pillai, D. Epema","doi":"10.1145/1996130.1996149","DOIUrl":"https://doi.org/10.1145/1996130.1996149","url":null,"abstract":"Interactive perception applications, such as gesture recognition and vision-based user interfaces, process high-data rate streams with compute intensive computer vision and machine learning algorithms. These applications can be represented as data flow graphs comprising several processing stages. Such applications require low latency to be interactive so that the results are immediately available to the user. To achieve low latency, we exploit the inherent coarse grained task and data parallelism of these applications by running them on clusters of machines. This paper addresses an important problem that arises: how to place the stages of these applications on machines to minimize the latency, and in particular, how to adjust an existing schedule in response to changes in the operating conditions (perturbations) while minimizing the disruption in the existing placement (churn). To this end, we propose four incremental placement heuristics which use the HEFT scheduling algorithm as their primary building block. Through simulations and experiments on a real implementation, using diverse workloads and a range of perturbation scenarios, we demonstrate that dynamic adjustment of the schedule can improve latency by as much as 36%, while producing little churn.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123815309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
ClusterSs: a task-based programming model for clusters clusters:用于集群的基于任务的编程模型
IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2011-06-08 DOI: 10.1145/1996130.1996168
E. Tejedor, Montse Farreras, D. Grove, Rosa M. Badia, G. Almási, Jesús Labarta
{"title":"ClusterSs: a task-based programming model for clusters","authors":"E. Tejedor, Montse Farreras, D. Grove, Rosa M. Badia, G. Almási, Jesús Labarta","doi":"10.1145/1996130.1996168","DOIUrl":"https://doi.org/10.1145/1996130.1996168","url":null,"abstract":"Programming for large-scale, multicore-based architectures requires adequate tools that offer ease of programming while not hindering application performance. StarSs is a family of parallel programming models based on automatic function level parallelism that targets productivity. StarSs deploys a data-flow model: it analyses dependencies between tasks and manages their execution, exploiting their concurrency as much as possible. We introduce Cluster Superscalar (ClusterSs), a new StarSs member designed to execute on clusters of SMPs. ClusterSs tasks are asynchronously created and assigned to the available resources with the support of the IBM APGAS runtime, which provides an efficient and portable communication layer based on one-sided communication.\u0000 This short paper gives an overview of the ClusterSs design on top of APGAS, as well as the conclusions of a productivity study; in this study, ClusterSs was compared to the IBM X10 language, both in terms of programmability and performance. A technical report is available with the details.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131196031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework 通过透明的运行时整合框架支持云环境中的GPU共享
IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2011-06-08 DOI: 10.1145/1996130.1996160
Vignesh T. Ravi, M. Becchi, G. Agrawal, S. Chakradhar
{"title":"Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework","authors":"Vignesh T. Ravi, M. Becchi, G. Agrawal, S. Chakradhar","doi":"10.1145/1996130.1996160","DOIUrl":"https://doi.org/10.1145/1996130.1996160","url":null,"abstract":"Driven by the emergence of GPUs as a major player in high performance computing and the rapidly growing popularity of cloud environments, GPU instances are now being offered by cloud providers. The use of GPUs in a cloud environment, however, is still at initial stages, and the challenge of making GPU a true shared resource in the cloud has not yet been addressed.\u0000 This paper presents a framework to enable applications executing within virtual machines to transparently share one or more GPUs. Our contributions are twofold: we extend an open source GPU virtualization software to include efficient GPU sharing, and we propose solutions to the conceptual problem of GPU kernel consolidation. In particular, we introduce a method for computing the affinity score between two or more kernels, which provides an indication of potential performance improvements upon kernel consolidation. In addition, we explore molding as a means to achieve efficient GPU sharing also in the case of kernels with high or conflicting resource requirements. We use these concepts to develop an algorithm to efficiently map a set of kernels on a pair of GPUs. We extensively evaluate our framework using eight popular GPU kernels and two Fermi GPUs. We find that even when contention is high our consolidation algorithm is effective in improving the throughput, and that the runtime overhead of our framework is low.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132553521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 127
Going back and forth: efficient multideployment and multisnapshotting on clouds 来回:在云上高效的多部署和多快照
IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2011-06-08 DOI: 10.1145/1996130.1996152
Bogdan Nicolae, J. Bresnahan, K. Keahey, Gabriel Antoniu
{"title":"Going back and forth: efficient multideployment and multisnapshotting on clouds","authors":"Bogdan Nicolae, J. Bresnahan, K. Keahey, Gabriel Antoniu","doi":"10.1145/1996130.1996152","DOIUrl":"https://doi.org/10.1145/1996130.1996152","url":null,"abstract":"Infrastructure as a Service (IaaS) cloud computing has revolutionized the way we think of acquiring resources by introducing a simple change: allowing users to lease computational resources from the cloud provider's datacenter for a short time by deploying virtual machines (VMs) on these resources. This new model raises new challenges in the design and development of IaaS middleware. One of those challenges is the need to deploy a large number (hundreds or even thousands) of VM instances simultaneously. Once the VM instances are deployed, another challenge is to simultaneously take a snapshot of many images and transfer them to persistent storage to support management tasks, such as suspend-resume and migration. With datacenters growing rapidly and configurations becoming heterogeneous, it is important to enable efficient concurrent deployment and snapshotting that are at the same time hypervisor independent and ensure a maximum compatibility with different configurations. This paper addresses these challenges by proposing a virtual file system specifically optimized for virtual machine image storage. It is based on a lazy transfer scheme coupled with object versioning that handles snapshotting transparently in a hypervisor-independent fashion, ensuring high portability for different configurations. Large-scale experiments on hundreds of nodes demonstrate excellent performance results: speedup for concurrent VM deployments ranges from a factor of 2 up to 25, with a reduction in bandwidth utilization of as much as 90%.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133141993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 68
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信