{"title":"LSM-Based Storage and Indexing: An Old Idea with Timely Benefits","authors":"Sattam Alsubaiee, M. Carey, Chen Li","doi":"10.1145/2786006.2786007","DOIUrl":null,"url":null,"abstract":"With the social-media data explosion, near real-time queries, particularly those of a spatio-temporal nature, can be challenging. In this paper, we show how to efficiently answer queries that target recent data within very large data sets. We describe a solution that exploits a natural partitioning property that LSM-based indexes have for components, allowing us to filter out many components when answering queries. Our solution is generalizable to any LSM-based index structure, and can be applied not just on temporal fields (e.g., based on recency), but on any \"time-correlated fields\" such as Universally Unique Identifiers (UUIDs), user-provided integer ids, etc. We have implemented and experimentally evaluated the solution in the context of the AsterixDB system.","PeriodicalId":443011,"journal":{"name":"Second International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Second International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2786006.2786007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16
Abstract
With the social-media data explosion, near real-time queries, particularly those of a spatio-temporal nature, can be challenging. In this paper, we show how to efficiently answer queries that target recent data within very large data sets. We describe a solution that exploits a natural partitioning property that LSM-based indexes have for components, allowing us to filter out many components when answering queries. Our solution is generalizable to any LSM-based index structure, and can be applied not just on temporal fields (e.g., based on recency), but on any "time-correlated fields" such as Universally Unique Identifiers (UUIDs), user-provided integer ids, etc. We have implemented and experimentally evaluated the solution in the context of the AsterixDB system.