VOLUME: Enable Large-Scale In-Memory Computation on Commodity Clusters

Zhiqiang Ma, David Ke Hong, Lin Gu
{"title":"VOLUME: Enable Large-Scale In-Memory Computation on Commodity Clusters","authors":"Zhiqiang Ma, David Ke Hong, Lin Gu","doi":"10.1109/CloudCom.2013.15","DOIUrl":null,"url":null,"abstract":"Traditional cloud computing technologies, such as MapReduce, use file systems as the system-wide substrate for data storage and sharing. A distributed file system provides a global name space and stores data persistently, but it also introduces significant overhead. Several recent systems use DRAM to store data and tremendously improve the performance of cloud computing systems. However, both our own experience and related work indicate that a simple substitution of distributed DRAM for the file system does not provide a solid and viable foundation for data storage and processing in the data center environment, and the capacity of such systems is limited by the amount of physical memory in the cluster. To overcome the challenge, we construct VOLUME (Virtual On-Line Unified Memory Environment), a distributed virtual memory to unify the physical memory and disk resources on many compute nodes, to form a system-wide data substrate. The new substrate provides a general memory based abstraction, takes advantage of DRAM in the system to accelerate computation, and, transparent to programmers, scales the system to handle large datasets by swapping data to disks and remote servers. The evaluation results show that VOLUME is much faster than Hadoop/HDFS, and delivers 6-11x speedups on the adjacency list workload. VOLUME is faster than both Hadoop/HDFS and Spark/RDD for in-memory sorting. For kmeans clustering, VOLUME scales linearly to 160 compute nodes on the TH-1/GZ supercomputer.","PeriodicalId":198053,"journal":{"name":"2013 IEEE 5th International Conference on Cloud Computing Technology and Science","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 5th International Conference on Cloud Computing Technology and Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CloudCom.2013.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Traditional cloud computing technologies, such as MapReduce, use file systems as the system-wide substrate for data storage and sharing. A distributed file system provides a global name space and stores data persistently, but it also introduces significant overhead. Several recent systems use DRAM to store data and tremendously improve the performance of cloud computing systems. However, both our own experience and related work indicate that a simple substitution of distributed DRAM for the file system does not provide a solid and viable foundation for data storage and processing in the data center environment, and the capacity of such systems is limited by the amount of physical memory in the cluster. To overcome the challenge, we construct VOLUME (Virtual On-Line Unified Memory Environment), a distributed virtual memory to unify the physical memory and disk resources on many compute nodes, to form a system-wide data substrate. The new substrate provides a general memory based abstraction, takes advantage of DRAM in the system to accelerate computation, and, transparent to programmers, scales the system to handle large datasets by swapping data to disks and remote servers. The evaluation results show that VOLUME is much faster than Hadoop/HDFS, and delivers 6-11x speedups on the adjacency list workload. VOLUME is faster than both Hadoop/HDFS and Spark/RDD for in-memory sorting. For kmeans clustering, VOLUME scales linearly to 160 compute nodes on the TH-1/GZ supercomputer.
VOLUME:在商品集群上启用大规模内存计算
传统的云计算技术,如MapReduce,使用文件系统作为数据存储和共享的全系统底层。分布式文件系统提供全局名称空间并持久地存储数据,但它也带来了巨大的开销。最近的几个系统使用DRAM来存储数据,极大地提高了云计算系统的性能。然而,我们自己的经验和相关工作都表明,简单地用分布式DRAM代替文件系统并不能为数据中心环境中的数据存储和处理提供坚实可行的基础,而且这种系统的容量受到集群中物理内存数量的限制。为了克服这一挑战,我们构建了卷(虚拟在线统一内存环境),一种分布式虚拟内存来统一许多计算节点上的物理内存和磁盘资源,形成一个系统范围的数据基板。新的基板提供了基于通用内存的抽象,利用系统中的DRAM来加速计算,并且对程序员透明,通过将数据交换到磁盘和远程服务器来扩展系统以处理大型数据集。评估结果表明,VOLUME比Hadoop/HDFS快得多,并且在邻接表工作负载上提供6-11倍的加速。在内存排序方面,VOLUME比Hadoop/HDFS和Spark/RDD都要快。对于kmeans集群,VOLUME在TH-1/GZ超级计算机上线性扩展到160个计算节点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信