准备扫描:从HBase中高效地检索结构化数据

Proceedings of the Symposium on Applied Computing Pub Date : 2017-04-03 DOI:10.1145/3019612.3019863

Francisco Neves, R. Vilaça, J. Pereira, R. Oliveira

{"title":"准备扫描:从HBase中高效地检索结构化数据","authors":"Francisco Neves, R. Vilaça, J. Pereira, R. Oliveira","doi":"10.1145/3019612.3019863","DOIUrl":null,"url":null,"abstract":"The ability of NoSQL systems to scale better than traditional relational databases motivates a large set of applications to migrate their data to NoSQL systems, even without aiming to exploit the provided schema flexibility. However, accessing structured data is costly due to such flexibility, incurring in a lot of bandwidth and processing unit usage. In this paper, we analyse this cost in Apache HBase and propose a new scan operation, named Prepared Scan, that optimizes the access to data structured in a regular manner by taking advantage of a well-known schema by application. Using an industry standard benchmark, we show that Prepared Scan improves throughput up to 29+ and decreases network bandwidth consumption up to 20+.","PeriodicalId":20728,"journal":{"name":"Proceedings of the Symposium on Applied Computing","volume":"13 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Prepared scan: efficient retrieval of structured data from HBase\",\"authors\":\"Francisco Neves, R. Vilaça, J. Pereira, R. Oliveira\",\"doi\":\"10.1145/3019612.3019863\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The ability of NoSQL systems to scale better than traditional relational databases motivates a large set of applications to migrate their data to NoSQL systems, even without aiming to exploit the provided schema flexibility. However, accessing structured data is costly due to such flexibility, incurring in a lot of bandwidth and processing unit usage. In this paper, we analyse this cost in Apache HBase and propose a new scan operation, named Prepared Scan, that optimizes the access to data structured in a regular manner by taking advantage of a well-known schema by application. Using an industry standard benchmark, we show that Prepared Scan improves throughput up to 29+ and decreases network bandwidth consumption up to 20+.\",\"PeriodicalId\":20728,\"journal\":{\"name\":\"Proceedings of the Symposium on Applied Computing\",\"volume\":\"13 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-04-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Symposium on Applied Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3019612.3019863\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Symposium on Applied Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3019612.3019863","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

NoSQL系统比传统关系数据库具有更好的可伸缩性，这促使大量应用程序将其数据迁移到NoSQL系统，即使它们并不打算利用所提供的模式灵活性。然而，由于这种灵活性，访问结构化数据的成本很高，导致大量带宽和处理单元的使用。在本文中，我们分析了Apache HBase中的这个开销，并提出了一种新的扫描操作，称为Prepared scan，它利用一个众所周知的应用程序模式来优化对以规则方式结构化的数据的访问。使用行业标准基准测试，我们证明了Prepared Scan将吞吐量提高到29+，并将网络带宽消耗降低到20+。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Prepared scan: efficient retrieval of structured data from HBase

The ability of NoSQL systems to scale better than traditional relational databases motivates a large set of applications to migrate their data to NoSQL systems, even without aiming to exploit the provided schema flexibility. However, accessing structured data is costly due to such flexibility, incurring in a lot of bandwidth and processing unit usage. In this paper, we analyse this cost in Apache HBase and propose a new scan operation, named Prepared Scan, that optimizes the access to data structured in a regular manner by taking advantage of a well-known schema by application. Using an industry standard benchmark, we show that Prepared Scan improves throughput up to 29+ and decreases network bandwidth consumption up to 20+.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Symposium on Applied Computing

自引率

0.00%

发文量