On main-memory flushing in microblogs data management systems

2016 IEEE 32nd International Conference on Data Engineering (ICDE) Pub Date : 2016-05-16 DOI:10.1109/ICDE.2016.7498261

A. Magdy, Rami Alghamdi, M. Mokbel

{"title":"On main-memory flushing in microblogs data management systems","authors":"A. Magdy, Rami Alghamdi, M. Mokbel","doi":"10.1109/ICDE.2016.7498261","DOIUrl":null,"url":null,"abstract":"Searching microblogs, e.g., tweets and comments, is practically supported through main-memory indexing for scalable data digestion and efficient query evaluation. With continuity and excessive numbers of microblogs, it is infeasible to keep data in main-memory for long periods. Thus, once allocated memory budget is filled, a portion of data is flushed from memory to disk to continuously accommodate newly incoming data. Existing techniques come with either low memory hit ratio due to flushing items regardless of their relevance to incoming queries or significant overhead of tracking individual data items, which limit scalability of microblogs systems in either cases. In this paper, we propose kFlushing policy that exploits popularity of top-k queries in microblogs to smartly select a subset of microblogs to flush. kFlushing is mainly designed to increase memory hit ratio. To this end, it identifies and flushes in-memory data that does not contribute to incoming queries. The freed memory space is utilized to accumulate more useful data that is used to answer more queries from memory contents. When all memory is utilized for useful data, kFlushing flushes data that is less likely to degrade memory hit ratio. In addition, kFlushing comes with a little overhead that keeps high system scalability in terms of high digestion rates of incoming fast data. Extensive experimental evaluation shows the effectiveness and scalability of kFlushing to improve main-memory hit by 26–330% while coping up with fast microblog streams of up to 100K microblog/second.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"10 1","pages":"445-456"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2016.7498261","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

Searching microblogs, e.g., tweets and comments, is practically supported through main-memory indexing for scalable data digestion and efficient query evaluation. With continuity and excessive numbers of microblogs, it is infeasible to keep data in main-memory for long periods. Thus, once allocated memory budget is filled, a portion of data is flushed from memory to disk to continuously accommodate newly incoming data. Existing techniques come with either low memory hit ratio due to flushing items regardless of their relevance to incoming queries or significant overhead of tracking individual data items, which limit scalability of microblogs systems in either cases. In this paper, we propose kFlushing policy that exploits popularity of top-k queries in microblogs to smartly select a subset of microblogs to flush. kFlushing is mainly designed to increase memory hit ratio. To this end, it identifies and flushes in-memory data that does not contribute to incoming queries. The freed memory space is utilized to accumulate more useful data that is used to answer more queries from memory contents. When all memory is utilized for useful data, kFlushing flushes data that is less likely to degrade memory hit ratio. In addition, kFlushing comes with a little overhead that keeps high system scalability in terms of high digestion rates of incoming fast data. Extensive experimental evaluation shows the effectiveness and scalability of kFlushing to improve main-memory hit by 26–330% while coping up with fast microblog streams of up to 100K microblog/second.

查看原文本刊更多论文

微博数据管理系统中的主存刷新

搜索微博，例如tweets和评论，实际上通过主存索引来支持可扩展的数据消化和高效的查询评估。由于微博的连续性和数量过多，将数据长时间保存在主存中是不可行的。因此，一旦分配的内存预算被填满，就会将一部分数据从内存刷新到磁盘，以持续容纳新传入的数据。现有技术要么由于刷新条目而不考虑它们与传入查询的相关性而导致内存命中率较低，要么由于跟踪单个数据项的开销很大，这限制了微博系统在这两种情况下的可伸缩性。在本文中，我们提出了kFlushing策略，该策略利用微博中top-k查询的流行度来智能地选择一个微博子集进行冲洗。kFlushing主要是为了提高内存命中率。为此，它识别并刷新内存中对传入查询没有贡献的数据。释放的内存空间被用来积累更多有用的数据，这些数据用于回答来自内存内容的更多查询。当所有内存都用于有用的数据时，kFlushing会刷新不太可能降低内存命中率的数据。此外，kFlushing带来了一点开销，在传入快速数据的高消化率方面保持了高系统可伸缩性。大量的实验评估表明了kFlushing的有效性和可扩展性，可以在处理高达100K微博/秒的快速微博流时将主存命中率提高26-330%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE 32nd International Conference on Data Engineering (ICDE)

自引率

0.00%

发文量