{"title":"CCIndex for Cassandra:一种新的Cassandra多维范围查询方案","authors":"Chen Feng, Yongqiang Zou, Zhiwei Xu","doi":"10.1109/SKG.2011.28","DOIUrl":null,"url":null,"abstract":"Multi-dimensional range queries are fundamental requirements in large scale Internet applications using Distributed Ordered Tables. Apache Cassandra is a Distributed Ordered Table when it employs order-preserving hashing as data partitioner. Cassandra supports multi-dimensional range queries with poor performance and with a limitation that there must be one dimension with an equal operator. Based on the success of CCIndex scheme in Apache HBase, this paper tries to answer the question: Can CCIndex benefit multi-dimensional range queries in DOTs like Cassandra? This paper studies the feasibility of employing CCIndex in Cassandra, proposes a new approach to estimate result size, implements CCIndex in Cassandra including recovery mechanisms and studies the pros and cons of CCIndex for different DOTs. Experimental results show that CCIndex gains 2.4 to 3.7 times efficiency over Cassandra's index scheme with 1% to 50% selectivity for 2 million records. This paper shows that CCIndex is a general approach for DOTs, and could gain better performance for DOTs which perform scan tasks much faster than random read. This paper reveals that Cassandra is optimized for hash tables rather than ordered tables in performing read and range queries.","PeriodicalId":184788,"journal":{"name":"2011 Seventh International Conference on Semantics, Knowledge and Grids","volume":"197 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"CCIndex for Cassandra: A Novel Scheme for Multi-dimensional Range Queries in Cassandra\",\"authors\":\"Chen Feng, Yongqiang Zou, Zhiwei Xu\",\"doi\":\"10.1109/SKG.2011.28\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-dimensional range queries are fundamental requirements in large scale Internet applications using Distributed Ordered Tables. Apache Cassandra is a Distributed Ordered Table when it employs order-preserving hashing as data partitioner. Cassandra supports multi-dimensional range queries with poor performance and with a limitation that there must be one dimension with an equal operator. Based on the success of CCIndex scheme in Apache HBase, this paper tries to answer the question: Can CCIndex benefit multi-dimensional range queries in DOTs like Cassandra? This paper studies the feasibility of employing CCIndex in Cassandra, proposes a new approach to estimate result size, implements CCIndex in Cassandra including recovery mechanisms and studies the pros and cons of CCIndex for different DOTs. Experimental results show that CCIndex gains 2.4 to 3.7 times efficiency over Cassandra's index scheme with 1% to 50% selectivity for 2 million records. This paper shows that CCIndex is a general approach for DOTs, and could gain better performance for DOTs which perform scan tasks much faster than random read. This paper reveals that Cassandra is optimized for hash tables rather than ordered tables in performing read and range queries.\",\"PeriodicalId\":184788,\"journal\":{\"name\":\"2011 Seventh International Conference on Semantics, Knowledge and Grids\",\"volume\":\"197 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 Seventh International Conference on Semantics, Knowledge and Grids\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SKG.2011.28\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Seventh International Conference on Semantics, Knowledge and Grids","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SKG.2011.28","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
CCIndex for Cassandra: A Novel Scheme for Multi-dimensional Range Queries in Cassandra
Multi-dimensional range queries are fundamental requirements in large scale Internet applications using Distributed Ordered Tables. Apache Cassandra is a Distributed Ordered Table when it employs order-preserving hashing as data partitioner. Cassandra supports multi-dimensional range queries with poor performance and with a limitation that there must be one dimension with an equal operator. Based on the success of CCIndex scheme in Apache HBase, this paper tries to answer the question: Can CCIndex benefit multi-dimensional range queries in DOTs like Cassandra? This paper studies the feasibility of employing CCIndex in Cassandra, proposes a new approach to estimate result size, implements CCIndex in Cassandra including recovery mechanisms and studies the pros and cons of CCIndex for different DOTs. Experimental results show that CCIndex gains 2.4 to 3.7 times efficiency over Cassandra's index scheme with 1% to 50% selectivity for 2 million records. This paper shows that CCIndex is a general approach for DOTs, and could gain better performance for DOTs which perform scan tasks much faster than random read. This paper reveals that Cassandra is optimized for hash tables rather than ordered tables in performing read and range queries.