BlueDBM:用于大数据分析的设备

S. Jun, Ming Liu, Sungjin Lee, Jamey Hicks, J. Ankcorn, Myron King, Shuotao Xu, Arvind
{"title":"BlueDBM:用于大数据分析的设备","authors":"S. Jun, Ming Liu, Sungjin Lee, Jamey Hicks, J. Ankcorn, Myron King, Shuotao Xu, Arvind","doi":"10.1145/2749469.2750412","DOIUrl":null,"url":null,"abstract":"Complex data queries, because of their need for random accesses, have proven to be slow unless all the data can be accommodated in DRAM. There are many domains, such as genomics, geological data and daily twitter feeds where the datasets of interest are 5TB to 20 TB. For such a dataset, one would need a cluster with 100 servers, each with 128GB to 256GBs of DRAM, to accommodate all the data in DRAM. On the other hand, such datasets could be stored easily in the flash memory of a rack-sized cluster. Flash storage has much better random access performance than hard disks, which makes it desirable for analytics workloads. In this paper we present BlueDBM, a new system architecture which has flash-based storage with in-store processing capability and a low-latency high-throughput inter-controller network. We show that BlueDBM outperforms a flash-based system without these features by a factor of 10 for some important applications. While the performance of a ram-cloud system falls sharply even if only 5%~10% of the references are to the secondary storage, this sharp performance degradation is not an issue in BlueDBM. BlueDBM presents an attractive point in the cost-performance trade-off for Big Data analytics.","PeriodicalId":6878,"journal":{"name":"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)","volume":"17 1","pages":"1-13"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"174","resultStr":"{\"title\":\"BlueDBM: An appliance for Big Data analytics\",\"authors\":\"S. Jun, Ming Liu, Sungjin Lee, Jamey Hicks, J. Ankcorn, Myron King, Shuotao Xu, Arvind\",\"doi\":\"10.1145/2749469.2750412\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Complex data queries, because of their need for random accesses, have proven to be slow unless all the data can be accommodated in DRAM. There are many domains, such as genomics, geological data and daily twitter feeds where the datasets of interest are 5TB to 20 TB. For such a dataset, one would need a cluster with 100 servers, each with 128GB to 256GBs of DRAM, to accommodate all the data in DRAM. On the other hand, such datasets could be stored easily in the flash memory of a rack-sized cluster. Flash storage has much better random access performance than hard disks, which makes it desirable for analytics workloads. In this paper we present BlueDBM, a new system architecture which has flash-based storage with in-store processing capability and a low-latency high-throughput inter-controller network. We show that BlueDBM outperforms a flash-based system without these features by a factor of 10 for some important applications. While the performance of a ram-cloud system falls sharply even if only 5%~10% of the references are to the secondary storage, this sharp performance degradation is not an issue in BlueDBM. BlueDBM presents an attractive point in the cost-performance trade-off for Big Data analytics.\",\"PeriodicalId\":6878,\"journal\":{\"name\":\"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)\",\"volume\":\"17 1\",\"pages\":\"1-13\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"174\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2749469.2750412\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2749469.2750412","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 174

摘要

复杂的数据查询,因为需要随机访问,已经被证明是缓慢的,除非所有的数据都可以容纳在DRAM中。有许多领域,如基因组学、地质数据和每日twitter feed,感兴趣的数据集在5TB到20tb之间。对于这样的数据集,需要一个包含100台服务器的集群,每台服务器有128GB到256gb的DRAM,以容纳DRAM中的所有数据。另一方面,这样的数据集可以很容易地存储在机架大小的集群的闪存中。闪存具有比硬盘更好的随机访问性能,这使得它适合分析工作负载。在本文中,我们提出了BlueDBM,一个新的系统架构,它具有基于闪存的存储,具有店内处理能力和低延迟的高吞吐量控制器间网络。我们表明,在一些重要应用中,BlueDBM的性能比没有这些特性的基于闪存的系统高出10倍。虽然ram-cloud系统的性能急剧下降,即使只有5%~10%的引用是二级存储,但这种急剧的性能下降在BlueDBM中不是问题。BlueDBM在大数据分析的成本-性能权衡中呈现出一个有吸引力的点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
BlueDBM: An appliance for Big Data analytics
Complex data queries, because of their need for random accesses, have proven to be slow unless all the data can be accommodated in DRAM. There are many domains, such as genomics, geological data and daily twitter feeds where the datasets of interest are 5TB to 20 TB. For such a dataset, one would need a cluster with 100 servers, each with 128GB to 256GBs of DRAM, to accommodate all the data in DRAM. On the other hand, such datasets could be stored easily in the flash memory of a rack-sized cluster. Flash storage has much better random access performance than hard disks, which makes it desirable for analytics workloads. In this paper we present BlueDBM, a new system architecture which has flash-based storage with in-store processing capability and a low-latency high-throughput inter-controller network. We show that BlueDBM outperforms a flash-based system without these features by a factor of 10 for some important applications. While the performance of a ram-cloud system falls sharply even if only 5%~10% of the references are to the secondary storage, this sharp performance degradation is not an issue in BlueDBM. BlueDBM presents an attractive point in the cost-performance trade-off for Big Data analytics.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信