Speed up Cassandra read path by using Coordinator Cache

2021 26th International Computer Conference, Computer Society of Iran (CSICC) Pub Date : 2021-03-03 DOI:10.1109/CSICC52343.2021.9420593

Latifa Azizi Vakili, N. Yazdani

{"title":"Speed up Cassandra read path by using Coordinator Cache","authors":"Latifa Azizi Vakili, N. Yazdani","doi":"10.1109/CSICC52343.2021.9420593","DOIUrl":null,"url":null,"abstract":"The fast increasing amount of massive and complex data in today’s Internet, called Big Data, requires sophisticated, comprehensive and highly operational databases. NoSQL databases are designed to fulfill Big Data requirements. Choosing an appropriate NoSQL database among various solutions to cover and manage big volume of data in Big Data, both in quantity and quality, itself is a big challenge. Cassandra is one of the distributed NoSQL databases mastered for managing very large amounts of structured and unstructured data spread out across many commodity servers, while providing highly available services with no single point of failure. Cassandra system was designed to run on cheap commodity hardware and handle high write through-put while not sacrificing read efficiency. This Paper will first present an overview of NoSQL databases, Big Data and IoT data as a controversial and complicated source of data in Big Data. Then, focuses on Cassandra database read request issues in its read path and suggests a model to reduce the time of read request (read query) coming from client side to Cassandra database. In this model we added a cache called Coordinator cache in Cassandra controlling nodes. Using a real dataset, we perform an analysis of Cassandra existing read path with suggested read path model and then compare the time of a read query before and after this model. The result shows that using Coordinator cache together with key cache offered by Cassandra database speedup data read request. Coordinator cache requires no extra memory because Cassandra Coordinator node does not store anything when doing controlling tasks over replica nodes and its potential memory space can be used for the introduced Coordinator cache.","PeriodicalId":374593,"journal":{"name":"2021 26th International Computer Conference, Computer Society of Iran (CSICC)","volume":"84 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 26th International Computer Conference, Computer Society of Iran (CSICC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSICC52343.2021.9420593","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The fast increasing amount of massive and complex data in today’s Internet, called Big Data, requires sophisticated, comprehensive and highly operational databases. NoSQL databases are designed to fulfill Big Data requirements. Choosing an appropriate NoSQL database among various solutions to cover and manage big volume of data in Big Data, both in quantity and quality, itself is a big challenge. Cassandra is one of the distributed NoSQL databases mastered for managing very large amounts of structured and unstructured data spread out across many commodity servers, while providing highly available services with no single point of failure. Cassandra system was designed to run on cheap commodity hardware and handle high write through-put while not sacrificing read efficiency. This Paper will first present an overview of NoSQL databases, Big Data and IoT data as a controversial and complicated source of data in Big Data. Then, focuses on Cassandra database read request issues in its read path and suggests a model to reduce the time of read request (read query) coming from client side to Cassandra database. In this model we added a cache called Coordinator cache in Cassandra controlling nodes. Using a real dataset, we perform an analysis of Cassandra existing read path with suggested read path model and then compare the time of a read query before and after this model. The result shows that using Coordinator cache together with key cache offered by Cassandra database speedup data read request. Coordinator cache requires no extra memory because Cassandra Coordinator node does not store anything when doing controlling tasks over replica nodes and its potential memory space can be used for the introduced Coordinator cache.

查看原文本刊更多论文

使用协调缓存加速Cassandra读取路径

当今互联网中海量、复杂的数据量快速增长，被称为大数据，需要复杂、全面、高可操作性的数据库。NoSQL数据库是为满足大数据需求而设计的。在各种解决方案中选择合适的NoSQL数据库来覆盖和管理大数据中的大量数据，无论是在数量上还是在质量上，本身都是一个很大的挑战。Cassandra是一种分布式NoSQL数据库，用于管理分布在许多商用服务器上的大量结构化和非结构化数据，同时提供无单点故障的高可用性服务。Cassandra系统被设计为运行在廉价的商用硬件上，在不牺牲读效率的情况下处理高写吞吐量。本文将首先概述NoSQL数据库，大数据和物联网数据作为大数据中有争议和复杂的数据源。然后，重点研究了Cassandra数据库读路径中的读请求问题，提出了一种减少客户端到Cassandra数据库的读请求(读查询)时间的模型。在这个模型中，我们在Cassandra控制节点中添加了一个名为Coordinator cache的缓存。利用实际数据集，对Cassandra已有的读路径和建议的读路径模型进行了分析，并比较了该模型前后的读查询时间。结果表明，将协调缓存与Cassandra数据库提供的键缓存结合使用，可以提高数据读取请求的速度。协调器缓存不需要额外的内存，因为Cassandra协调器节点在对副本节点执行控制任务时不存储任何东西，而且它的潜在内存空间可以用于引入的协调器缓存。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 26th International Computer Conference, Computer Society of Iran (CSICC)

自引率

0.00%

发文量