HConfig:在HBase中进行资源自适应快速批量加载

Xianqiang Bao, Ling Liu, Nong Xiao, Fang Liu, Qi Zhang, T. Zhu
{"title":"HConfig:在HBase中进行资源自适应快速批量加载","authors":"Xianqiang Bao, Ling Liu, Nong Xiao, Fang Liu, Qi Zhang, T. Zhu","doi":"10.4108/ICST.COLLABORATECOM.2014.257304","DOIUrl":null,"url":null,"abstract":"NoSQL (Not only SQL) data stores become a vital component in many big data computing platforms due to its inherent horizontal scalability. HBase is an open-source distributed NoSQL store that is widely used by many Internet enterprises to handle their big data computing applications (e.g. Facebook handles millions of messages each day with HBase). Optimizations that can enhance the performance of HBase are of paramount interests for big data applications that use HBase or Big Table like key-value stores. In this paper we study the problems inherent in misconfiguration of HBase clusters, including scenarios where the HBase default configurations can lead to poor performance. We develop HConfig, a semi-automated configuration manager for optimizing HBase system performance from multiple dimensions. Due to the space constraint, this paper will focus on how to improve the performance of HBase data loader using HConfig. Through this case study we will highlight the importance of resource adaptive and workload aware auto-configuration management and the design principles of HConfig. Our experiments show that the HConfig enhanced bulk loading can significantly improve the performance of HBase bulk loading jobs compared to the HBase default configuration, and achieve 2~3.7× speedup in throughput under different client threads while maintaining linear horizontal scalability.","PeriodicalId":432345,"journal":{"name":"10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"HConfig: Resource adaptive fast bulk loading in HBase\",\"authors\":\"Xianqiang Bao, Ling Liu, Nong Xiao, Fang Liu, Qi Zhang, T. Zhu\",\"doi\":\"10.4108/ICST.COLLABORATECOM.2014.257304\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"NoSQL (Not only SQL) data stores become a vital component in many big data computing platforms due to its inherent horizontal scalability. HBase is an open-source distributed NoSQL store that is widely used by many Internet enterprises to handle their big data computing applications (e.g. Facebook handles millions of messages each day with HBase). Optimizations that can enhance the performance of HBase are of paramount interests for big data applications that use HBase or Big Table like key-value stores. In this paper we study the problems inherent in misconfiguration of HBase clusters, including scenarios where the HBase default configurations can lead to poor performance. We develop HConfig, a semi-automated configuration manager for optimizing HBase system performance from multiple dimensions. Due to the space constraint, this paper will focus on how to improve the performance of HBase data loader using HConfig. Through this case study we will highlight the importance of resource adaptive and workload aware auto-configuration management and the design principles of HConfig. Our experiments show that the HConfig enhanced bulk loading can significantly improve the performance of HBase bulk loading jobs compared to the HBase default configuration, and achieve 2~3.7× speedup in throughput under different client threads while maintaining linear horizontal scalability.\",\"PeriodicalId\":432345,\"journal\":{\"name\":\"10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4108/ICST.COLLABORATECOM.2014.257304\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4108/ICST.COLLABORATECOM.2014.257304","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

摘要

NoSQL(不仅仅是SQL)数据存储由于其固有的水平可扩展性而成为许多大数据计算平台的重要组成部分。HBase是一个开源的分布式NoSQL存储,被许多互联网企业广泛用于处理他们的大数据计算应用(例如Facebook每天用HBase处理数百万条消息)。对于使用HBase或大表(如键值存储)的大数据应用程序来说,能够增强HBase性能的优化是最重要的。在本文中,我们研究了HBase集群错误配置所固有的问题,包括HBase默认配置可能导致性能低下的场景。我们开发了HConfig,一个半自动配置管理器,用于从多个维度优化HBase系统性能。由于篇幅限制,本文将重点研究如何使用HConfig来提高HBase数据加载器的性能。通过本案例研究,我们将强调资源自适应和工作负载感知自动配置管理的重要性以及HConfig的设计原则。我们的实验表明,与HBase默认配置相比,HConfig增强的批量加载可以显著提高HBase批量加载作业的性能,在保持线性水平可扩展性的同时,在不同的客户端线程下实现2~3.7倍的吞吐量加速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
HConfig: Resource adaptive fast bulk loading in HBase
NoSQL (Not only SQL) data stores become a vital component in many big data computing platforms due to its inherent horizontal scalability. HBase is an open-source distributed NoSQL store that is widely used by many Internet enterprises to handle their big data computing applications (e.g. Facebook handles millions of messages each day with HBase). Optimizations that can enhance the performance of HBase are of paramount interests for big data applications that use HBase or Big Table like key-value stores. In this paper we study the problems inherent in misconfiguration of HBase clusters, including scenarios where the HBase default configurations can lead to poor performance. We develop HConfig, a semi-automated configuration manager for optimizing HBase system performance from multiple dimensions. Due to the space constraint, this paper will focus on how to improve the performance of HBase data loader using HConfig. Through this case study we will highlight the importance of resource adaptive and workload aware auto-configuration management and the design principles of HConfig. Our experiments show that the HConfig enhanced bulk loading can significantly improve the performance of HBase bulk loading jobs compared to the HBase default configuration, and achieve 2~3.7× speedup in throughput under different client threads while maintaining linear horizontal scalability.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信