Francisco Neves, R. Vilaça, J. Pereira, R. Oliveira
{"title":"准备扫描:从HBase中高效地检索结构化数据","authors":"Francisco Neves, R. Vilaça, J. Pereira, R. Oliveira","doi":"10.1145/3019612.3019863","DOIUrl":null,"url":null,"abstract":"The ability of NoSQL systems to scale better than traditional relational databases motivates a large set of applications to migrate their data to NoSQL systems, even without aiming to exploit the provided schema flexibility. However, accessing structured data is costly due to such flexibility, incurring in a lot of bandwidth and processing unit usage. In this paper, we analyse this cost in Apache HBase and propose a new scan operation, named Prepared Scan, that optimizes the access to data structured in a regular manner by taking advantage of a well-known schema by application. Using an industry standard benchmark, we show that Prepared Scan improves throughput up to 29+ and decreases network bandwidth consumption up to 20+.","PeriodicalId":20728,"journal":{"name":"Proceedings of the Symposium on Applied Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2017-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Prepared scan: efficient retrieval of structured data from HBase\",\"authors\":\"Francisco Neves, R. Vilaça, J. Pereira, R. Oliveira\",\"doi\":\"10.1145/3019612.3019863\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The ability of NoSQL systems to scale better than traditional relational databases motivates a large set of applications to migrate their data to NoSQL systems, even without aiming to exploit the provided schema flexibility. However, accessing structured data is costly due to such flexibility, incurring in a lot of bandwidth and processing unit usage. In this paper, we analyse this cost in Apache HBase and propose a new scan operation, named Prepared Scan, that optimizes the access to data structured in a regular manner by taking advantage of a well-known schema by application. Using an industry standard benchmark, we show that Prepared Scan improves throughput up to 29+ and decreases network bandwidth consumption up to 20+.\",\"PeriodicalId\":20728,\"journal\":{\"name\":\"Proceedings of the Symposium on Applied Computing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-04-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Symposium on Applied Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3019612.3019863\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Symposium on Applied Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3019612.3019863","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Prepared scan: efficient retrieval of structured data from HBase
The ability of NoSQL systems to scale better than traditional relational databases motivates a large set of applications to migrate their data to NoSQL systems, even without aiming to exploit the provided schema flexibility. However, accessing structured data is costly due to such flexibility, incurring in a lot of bandwidth and processing unit usage. In this paper, we analyse this cost in Apache HBase and propose a new scan operation, named Prepared Scan, that optimizes the access to data structured in a regular manner by taking advantage of a well-known schema by application. Using an industry standard benchmark, we show that Prepared Scan improves throughput up to 29+ and decreases network bandwidth consumption up to 20+.