VOLUME:在商品集群上启用大规模内存计算

2013 IEEE 5th International Conference on Cloud Computing Technology and Science Pub Date : 2013-12-02 DOI:10.1109/CloudCom.2013.15

Zhiqiang Ma, David Ke Hong, Lin Gu

{"title":"VOLUME:在商品集群上启用大规模内存计算","authors":"Zhiqiang Ma, David Ke Hong, Lin Gu","doi":"10.1109/CloudCom.2013.15","DOIUrl":null,"url":null,"abstract":"Traditional cloud computing technologies, such as MapReduce, use file systems as the system-wide substrate for data storage and sharing. A distributed file system provides a global name space and stores data persistently, but it also introduces significant overhead. Several recent systems use DRAM to store data and tremendously improve the performance of cloud computing systems. However, both our own experience and related work indicate that a simple substitution of distributed DRAM for the file system does not provide a solid and viable foundation for data storage and processing in the data center environment, and the capacity of such systems is limited by the amount of physical memory in the cluster. To overcome the challenge, we construct VOLUME (Virtual On-Line Unified Memory Environment), a distributed virtual memory to unify the physical memory and disk resources on many compute nodes, to form a system-wide data substrate. The new substrate provides a general memory based abstraction, takes advantage of DRAM in the system to accelerate computation, and, transparent to programmers, scales the system to handle large datasets by swapping data to disks and remote servers. The evaluation results show that VOLUME is much faster than Hadoop/HDFS, and delivers 6-11x speedups on the adjacency list workload. VOLUME is faster than both Hadoop/HDFS and Spark/RDD for in-memory sorting. For kmeans clustering, VOLUME scales linearly to 160 compute nodes on the TH-1/GZ supercomputer.","PeriodicalId":198053,"journal":{"name":"2013 IEEE 5th International Conference on Cloud Computing Technology and Science","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"VOLUME: Enable Large-Scale In-Memory Computation on Commodity Clusters\",\"authors\":\"Zhiqiang Ma, David Ke Hong, Lin Gu\",\"doi\":\"10.1109/CloudCom.2013.15\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Traditional cloud computing technologies, such as MapReduce, use file systems as the system-wide substrate for data storage and sharing. A distributed file system provides a global name space and stores data persistently, but it also introduces significant overhead. Several recent systems use DRAM to store data and tremendously improve the performance of cloud computing systems. However, both our own experience and related work indicate that a simple substitution of distributed DRAM for the file system does not provide a solid and viable foundation for data storage and processing in the data center environment, and the capacity of such systems is limited by the amount of physical memory in the cluster. To overcome the challenge, we construct VOLUME (Virtual On-Line Unified Memory Environment), a distributed virtual memory to unify the physical memory and disk resources on many compute nodes, to form a system-wide data substrate. The new substrate provides a general memory based abstraction, takes advantage of DRAM in the system to accelerate computation, and, transparent to programmers, scales the system to handle large datasets by swapping data to disks and remote servers. The evaluation results show that VOLUME is much faster than Hadoop/HDFS, and delivers 6-11x speedups on the adjacency list workload. VOLUME is faster than both Hadoop/HDFS and Spark/RDD for in-memory sorting. For kmeans clustering, VOLUME scales linearly to 160 compute nodes on the TH-1/GZ supercomputer.\",\"PeriodicalId\":198053,\"journal\":{\"name\":\"2013 IEEE 5th International Conference on Cloud Computing Technology and Science\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE 5th International Conference on Cloud Computing Technology and Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CloudCom.2013.15\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 5th International Conference on Cloud Computing Technology and Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CloudCom.2013.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

传统的云计算技术，如MapReduce，使用文件系统作为数据存储和共享的全系统底层。分布式文件系统提供全局名称空间并持久地存储数据，但它也带来了巨大的开销。最近的几个系统使用DRAM来存储数据，极大地提高了云计算系统的性能。然而，我们自己的经验和相关工作都表明，简单地用分布式DRAM代替文件系统并不能为数据中心环境中的数据存储和处理提供坚实可行的基础，而且这种系统的容量受到集群中物理内存数量的限制。为了克服这一挑战，我们构建了卷(虚拟在线统一内存环境)，一种分布式虚拟内存来统一许多计算节点上的物理内存和磁盘资源，形成一个系统范围的数据基板。新的基板提供了基于通用内存的抽象，利用系统中的DRAM来加速计算，并且对程序员透明，通过将数据交换到磁盘和远程服务器来扩展系统以处理大型数据集。评估结果表明，VOLUME比Hadoop/HDFS快得多，并且在邻接表工作负载上提供6-11倍的加速。在内存排序方面，VOLUME比Hadoop/HDFS和Spark/RDD都要快。对于kmeans集群，VOLUME在TH-1/GZ超级计算机上线性扩展到160个计算节点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

VOLUME: Enable Large-Scale In-Memory Computation on Commodity Clusters

Traditional cloud computing technologies, such as MapReduce, use file systems as the system-wide substrate for data storage and sharing. A distributed file system provides a global name space and stores data persistently, but it also introduces significant overhead. Several recent systems use DRAM to store data and tremendously improve the performance of cloud computing systems. However, both our own experience and related work indicate that a simple substitution of distributed DRAM for the file system does not provide a solid and viable foundation for data storage and processing in the data center environment, and the capacity of such systems is limited by the amount of physical memory in the cluster. To overcome the challenge, we construct VOLUME (Virtual On-Line Unified Memory Environment), a distributed virtual memory to unify the physical memory and disk resources on many compute nodes, to form a system-wide data substrate. The new substrate provides a general memory based abstraction, takes advantage of DRAM in the system to accelerate computation, and, transparent to programmers, scales the system to handle large datasets by swapping data to disks and remote servers. The evaluation results show that VOLUME is much faster than Hadoop/HDFS, and delivers 6-11x speedups on the adjacency list workload. VOLUME is faster than both Hadoop/HDFS and Spark/RDD for in-memory sorting. For kmeans clustering, VOLUME scales linearly to 160 compute nodes on the TH-1/GZ supercomputer.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 IEEE 5th International Conference on Cloud Computing Technology and Science

自引率

0.00%

发文量