2008 IEEE International Conference on Cluster Computing最新文献_第6页

Reliable adaptable Network RAM 可靠的自适应网络RAM

2008 IEEE International Conference on Cluster Computing Pub Date : 2008-10-31 DOI: 10.1109/CLUSTR.2008.4663750

T. Newhall, D. Amato, A. Pshenichkin

{"title":"Reliable adaptable Network RAM","authors":"T. Newhall, D. Amato, A. Pshenichkin","doi":"10.1109/CLUSTR.2008.4663750","DOIUrl":"https://doi.org/10.1109/CLUSTR.2008.4663750","url":null,"abstract":"We present reliability solutions for adaptable network RAM systems running on general-purpose clusters. Network RAM allows nodes with over-committed memory to swap pages over the network, storing them in the idle RAM of other nodes and avoiding swapping to slow, local disk. An adaptable network RAM system adjusts the amount of RAM currently available for storing remotely swapped pages in response to changes in nodespsila local RAM usage. It is important that network RAM systems provide reliability for remotely swapped page data. Without reliability, a single node failure can result in failure of unrelated processes running on other nodes by losing their remotely swapped pages. Adaptable network RAM systems pose extra difficulties in providing reliability because each nodepsilas capacity for storing remotely swapped pages changes over time, and because pages may move from node to node in response to these changes. Our novel dynamic RAID-based reliability solutions use idle RAM for storing page and reliability data, avoiding using slow disk for reliability. They are designed to work with the adaptive nature of our network RAM system (Nswap), allowing page and reliability data to migrate from node to node and allowing pages to be added to or removed from different parity groups. Additionally, page recovery runs concurrently with cluster applications, so that cluster applications do not have to wait until all data from a failed node is recovered before resuming execution. We present results comparing Nswap to disk swapping for a set of benchmarks running on our gigabit cluster. Our results show that reliable Nswap is up to 32 times faster than swapping to disk, and that there is virtually no impact on the performance of applications as they run concurrently with page recovery.","PeriodicalId":198768,"journal":{"name":"2008 IEEE International Conference on Cluster Computing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115401184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Efficient one-copy MPI shared memory communication in Virtual Machines 虚拟机中高效的单拷贝MPI共享内存通信

2008 IEEE International Conference on Cluster Computing Pub Date : 2008-10-31 DOI: 10.1109/CLUSTR.2008.4663761

Wei Huang, Matthew J. Koop, D. Panda

{"title":"Efficient one-copy MPI shared memory communication in Virtual Machines","authors":"Wei Huang, Matthew J. Koop, D. Panda","doi":"10.1109/CLUSTR.2008.4663761","DOIUrl":"https://doi.org/10.1109/CLUSTR.2008.4663761","url":null,"abstract":"Efficient intra-node shared memory communication is important for high performance computing (HPC), especially with the emergence of multi-core architectures. As clusters continue to grow in size and complexity, the use of virtual machine (VM) technologies has been suggested to ease the increasing number of management issues. As demonstrated by earlier research, shared memory communication must be optimized for VMs to attain the native-level performance required by HPC centers. In this paper, we enhance intra-node shared memory communication for VM environments. We propose a one-copy approach. Instead of following the traditional approach used in most MPI implementations, copying data in and out of a pre-allocated shared memory region, our approach dynamically maps user buffers between VMs, allowing data to be directly copied to its destination. We also propose a grant/mapping cache to reduce expensive buffer mapping cost in VM environment. We integrate this approach into MVAPICH2, our implementation of MPI-2 library. For intra-node communication, we are able to reduce the large message latency in VM-based environments by up to 35%, and increase bandwidth by up to 38% even as compared with unmodified MVAPICH2 running in a native environment. Evaluation with the NAS Parallel Benchmarks suite shows up to 15% improvement.","PeriodicalId":198768,"journal":{"name":"2008 IEEE International Conference on Cluster Computing","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123895157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

A dependency-aware task-based programming environment for multi-core architectures 用于多核体系结构的依赖感知的基于任务的编程环境

2008 IEEE International Conference on Cluster Computing Pub Date : 2008-10-31 DOI: 10.1109/CLUSTR.2008.4663765

Josep M. Pérez, Rosa M. Badia, Jesús Labarta

引用次数: 259

DLM: A distributed Large Memory System using remote memory swapping over cluster nodes DLM:在集群节点上使用远程内存交换的分布式大内存系统

2008 IEEE International Conference on Cluster Computing Pub Date : 2008-10-31 DOI: 10.1109/CLUSTR.2008.4663780

H. Midorikawa, M. Kurokawa, R. Himeno, M. Sato

{"title":"DLM: A distributed Large Memory System using remote memory swapping over cluster nodes","authors":"H. Midorikawa, M. Kurokawa, R. Himeno, M. Sato","doi":"10.1109/CLUSTR.2008.4663780","DOIUrl":"https://doi.org/10.1109/CLUSTR.2008.4663780","url":null,"abstract":"Emerging 64 bitOSpsilas supply a huge amount of memory address space that is essential for new applications using very large data. It is expected that the memory in connected nodes can be used to store swapped pages efficiently, especially in a dedicated cluster which has a high-speed network such as 10 GbE and Infiniband. In this paper, we propose the distributed large memory system (DLM), which provides very large virtual memory by using remote memory distributed over the nodes in a cluster. The performance of DLM programs using remote memory is compared to ordinary programs using local memory. The results of STREAM, NPB and Himeno benchmarks show that the DLM achieves better performance than other remote paging schemes using a block swap device to access remote memory. In addition to performance, DLM offers the advantages of easy availability and high portability, because it is a user-level software without the need for special hardware. To obtain high performance, the DLM can tune its parameters independently from kernel swap parameters. We also found that DLMpsilas independence of kernel swapping provides more stable behavior.","PeriodicalId":198768,"journal":{"name":"2008 IEEE International Conference on Cluster Computing","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132565255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Enabling lock-free concurrent fine-grain access to massive distributed data: Application to supernovae detection 实现对海量分布式数据的无锁并发细粒度访问:超新星探测中的应用

2008 IEEE International Conference on Cluster Computing Pub Date : 2008-10-13 DOI: 10.1109/CLUSTR.2008.4663787

Bogdan Nicolae, Gabriel Antoniu, L. Bougé

引用次数: 6

Live and incremental whole-system migration of virtual machines using block-bitmap 使用块位图实现虚拟机的实时和增量全系统迁移

2008 IEEE International Conference on Cluster Computing Pub Date : 2008-09-01 DOI: 10.1109/CLUSTR.2008.4663760

Yingwei Luo, Binbin Zhang, Xiaolin Wang, Zhenlin Wang, Yifeng Sun, Haogang Chen

{"title":"Live and incremental whole-system migration of virtual machines using block-bitmap","authors":"Yingwei Luo, Binbin Zhang, Xiaolin Wang, Zhenlin Wang, Yifeng Sun, Haogang Chen","doi":"10.1109/CLUSTR.2008.4663760","DOIUrl":"https://doi.org/10.1109/CLUSTR.2008.4663760","url":null,"abstract":"In this paper, we describe a whole-system live migration scheme, which transfers the whole system run-time state, including CPU state, memory data, and local disk storage, of the virtual machine (VM). To minimize the downtime caused by migrating large disk storage data and keep data integrity and consistency, we propose a three-phase migration (TPM) algorithm. To facilitate the migration back to initial source machine, we use an incremental migration (IM) algorithm to reduce the amount of the data to be migrated. Block-bitmap is used to track all the write accesses to the local disk storage during the migration. Synchronization of the local disk storage in the migration is performed according to the block-bitmap. Experiments show that our algorithms work well even when I/O-intensive workloads are running in the migrated VM. The downtime of the migration is around 100 milliseconds, close to shared-storage migration. Total migration time is greatly reduced using IM. The block-bitmap based synchronization mechanism is simple and effective. Performance overhead of recording all the writes on migrated VM is very low.","PeriodicalId":198768,"journal":{"name":"2008 IEEE International Conference on Cluster Computing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114804069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 116

Magnet: A novel scheduling policy for power reduction in cluster with virtual machines Magnet:一种用于虚拟机集群中降低功耗的新调度策略

2008 IEEE International Conference on Cluster Computing Pub Date : 2008-09-01 DOI: 10.1109/CLUSTR.2008.4663751

Liting Hu, Hai Jin, Xiaofei Liao, Xianjie Xiong, Haikun Liu

引用次数: 84

A multicore-enabled multirail communication engine 支持多核的多轨通信引擎

2008 IEEE International Conference on Cluster Computing Pub Date : 2008-09-01 DOI: 10.1109/CLUSTR.2008.4663788

E. Brunet, François Trahay, Alexandre Denis

引用次数: 5

Intelligent compilers 智能的编译器

2008 IEEE International Conference on Cluster Computing Pub Date : 2008-09-01 DOI: 10.1109/CLUSTR.2008.4663796

John Cavazos

引用次数: 11