{"title":"Speed up Cassandra read path by using Coordinator Cache","authors":"Latifa Azizi Vakili, N. Yazdani","doi":"10.1109/CSICC52343.2021.9420593","DOIUrl":null,"url":null,"abstract":"The fast increasing amount of massive and complex data in today’s Internet, called Big Data, requires sophisticated, comprehensive and highly operational databases. NoSQL databases are designed to fulfill Big Data requirements. Choosing an appropriate NoSQL database among various solutions to cover and manage big volume of data in Big Data, both in quantity and quality, itself is a big challenge. Cassandra is one of the distributed NoSQL databases mastered for managing very large amounts of structured and unstructured data spread out across many commodity servers, while providing highly available services with no single point of failure. Cassandra system was designed to run on cheap commodity hardware and handle high write through-put while not sacrificing read efficiency. This Paper will first present an overview of NoSQL databases, Big Data and IoT data as a controversial and complicated source of data in Big Data. Then, focuses on Cassandra database read request issues in its read path and suggests a model to reduce the time of read request (read query) coming from client side to Cassandra database. In this model we added a cache called Coordinator cache in Cassandra controlling nodes. Using a real dataset, we perform an analysis of Cassandra existing read path with suggested read path model and then compare the time of a read query before and after this model. The result shows that using Coordinator cache together with key cache offered by Cassandra database speedup data read request. Coordinator cache requires no extra memory because Cassandra Coordinator node does not store anything when doing controlling tasks over replica nodes and its potential memory space can be used for the introduced Coordinator cache.","PeriodicalId":374593,"journal":{"name":"2021 26th International Computer Conference, Computer Society of Iran (CSICC)","volume":"84 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 26th International Computer Conference, Computer Society of Iran (CSICC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSICC52343.2021.9420593","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The fast increasing amount of massive and complex data in today’s Internet, called Big Data, requires sophisticated, comprehensive and highly operational databases. NoSQL databases are designed to fulfill Big Data requirements. Choosing an appropriate NoSQL database among various solutions to cover and manage big volume of data in Big Data, both in quantity and quality, itself is a big challenge. Cassandra is one of the distributed NoSQL databases mastered for managing very large amounts of structured and unstructured data spread out across many commodity servers, while providing highly available services with no single point of failure. Cassandra system was designed to run on cheap commodity hardware and handle high write through-put while not sacrificing read efficiency. This Paper will first present an overview of NoSQL databases, Big Data and IoT data as a controversial and complicated source of data in Big Data. Then, focuses on Cassandra database read request issues in its read path and suggests a model to reduce the time of read request (read query) coming from client side to Cassandra database. In this model we added a cache called Coordinator cache in Cassandra controlling nodes. Using a real dataset, we perform an analysis of Cassandra existing read path with suggested read path model and then compare the time of a read query before and after this model. The result shows that using Coordinator cache together with key cache offered by Cassandra database speedup data read request. Coordinator cache requires no extra memory because Cassandra Coordinator node does not store anything when doing controlling tasks over replica nodes and its potential memory space can be used for the introduced Coordinator cache.