HPCache:通过比例缓存实现内存高效OLAP

Proceedings of the 18th International Workshop on Data Management on New Hardware Pub Date : 2022-06-12 DOI:10.1145/3533737.3535100

Hamish Nicholson, Periklis Chrysogelos, A. Ailamaki

{"title":"HPCache:通过比例缓存实现内存高效OLAP","authors":"Hamish Nicholson, Periklis Chrysogelos, A. Ailamaki","doi":"10.1145/3533737.3535100","DOIUrl":null,"url":null,"abstract":"Analytical engines rely on in-memory caching to avoid disk accesses and provide timely responses by keeping the most frequently accessed data in memory. Purely frequency- & time-based caching decisions, however, are a proxy of the expected query execution speedup only when disk accesses are significantly slower than in-memory query processing. On the other hand, fast storage offers loading times that approach or even outperform fully in-memory query execution response times, rendering purely frequency-based statistics incapable of capturing impact of a caching decision on query execution. For example, caching the input of a frequent query that spends most of its time processing joins is less beneficial than caching a page for a slightly less frequent but scan-heavy query. As a result, existing caching policies waste valuable memory space to cache input data that offer little-to-no acceleration for analytics. This paper proposes HPCache, a buffer management policy that enables fast analytics on high-bandwidth storage by efficiently using the available in-memory space. HPCache caches data based on their speedup potential instead of relying on frequency-based statistics. We show that, with fast storage, the benefit of in-memory caching varies significantly across queries; therefore, we quantify the efficiency of caching decisions and formulate an optimization problem. We implement HPCache in Proteus and show that i) estimating speedup potential improves memory space utilization, and ii) simple runtime statistics suffice to infer speedup expectations. We show that HPCache achieves up to 12% faster query execution over state-of-the-art caching policies, or 75% less in-memory cache footprint without deteriorating query performance. Overall, HPCache enables efficient use of the in-memory space for input caching in the presence of fast storage, without any requirement for workload predictions.","PeriodicalId":381503,"journal":{"name":"Proceedings of the 18th International Workshop on Data Management on New Hardware","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"HPCache: Memory-Efficient OLAP Through Proportional Caching\",\"authors\":\"Hamish Nicholson, Periklis Chrysogelos, A. Ailamaki\",\"doi\":\"10.1145/3533737.3535100\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Analytical engines rely on in-memory caching to avoid disk accesses and provide timely responses by keeping the most frequently accessed data in memory. Purely frequency- & time-based caching decisions, however, are a proxy of the expected query execution speedup only when disk accesses are significantly slower than in-memory query processing. On the other hand, fast storage offers loading times that approach or even outperform fully in-memory query execution response times, rendering purely frequency-based statistics incapable of capturing impact of a caching decision on query execution. For example, caching the input of a frequent query that spends most of its time processing joins is less beneficial than caching a page for a slightly less frequent but scan-heavy query. As a result, existing caching policies waste valuable memory space to cache input data that offer little-to-no acceleration for analytics. This paper proposes HPCache, a buffer management policy that enables fast analytics on high-bandwidth storage by efficiently using the available in-memory space. HPCache caches data based on their speedup potential instead of relying on frequency-based statistics. We show that, with fast storage, the benefit of in-memory caching varies significantly across queries; therefore, we quantify the efficiency of caching decisions and formulate an optimization problem. We implement HPCache in Proteus and show that i) estimating speedup potential improves memory space utilization, and ii) simple runtime statistics suffice to infer speedup expectations. We show that HPCache achieves up to 12% faster query execution over state-of-the-art caching policies, or 75% less in-memory cache footprint without deteriorating query performance. Overall, HPCache enables efficient use of the in-memory space for input caching in the presence of fast storage, without any requirement for workload predictions.\",\"PeriodicalId\":381503,\"journal\":{\"name\":\"Proceedings of the 18th International Workshop on Data Management on New Hardware\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 18th International Workshop on Data Management on New Hardware\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3533737.3535100\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 18th International Workshop on Data Management on New Hardware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3533737.3535100","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

分析引擎依靠内存缓存来避免磁盘访问，并通过将访问最频繁的数据保存在内存中来提供及时的响应。然而，纯粹基于频率和时间的缓存决策，只有在磁盘访问明显慢于内存中的查询处理时，才能代表预期的查询执行加速。另一方面，快速存储提供的加载时间接近甚至超过完全内存中的查询执行响应时间，使得纯粹基于频率的统计数据无法捕捉缓存决策对查询执行的影响。例如，缓存将大部分时间用于处理连接的频繁查询的输入不如缓存用于频率稍低但扫描量较大的查询的页面有益。因此，现有的缓存策略浪费了宝贵的内存空间来缓存输入数据，而这些数据几乎没有为分析提供加速。本文提出了一种缓存管理策略HPCache，通过有效地利用可用的内存空间，实现对高带宽存储的快速分析。HPCache根据它们的加速潜力来缓存数据，而不是依赖于基于频率的统计数据。我们表明，对于快速存储，内存缓存的好处在不同的查询中差异很大;因此，我们量化了缓存决策的效率，并制定了一个优化问题。我们在Proteus中实现了HPCache，并表明i)估计加速潜力可以提高内存空间利用率，ii)简单的运行时统计数据足以推断加速预期。我们表明，与最先进的缓存策略相比，HPCache的查询执行速度提高了12%，或者在不降低查询性能的情况下减少了75%的内存缓存占用。总的来说，HPCache支持在存在快速存储的情况下有效地使用内存空间进行输入缓存，而不需要任何工作负载预测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

HPCache: Memory-Efficient OLAP Through Proportional Caching

Analytical engines rely on in-memory caching to avoid disk accesses and provide timely responses by keeping the most frequently accessed data in memory. Purely frequency- & time-based caching decisions, however, are a proxy of the expected query execution speedup only when disk accesses are significantly slower than in-memory query processing. On the other hand, fast storage offers loading times that approach or even outperform fully in-memory query execution response times, rendering purely frequency-based statistics incapable of capturing impact of a caching decision on query execution. For example, caching the input of a frequent query that spends most of its time processing joins is less beneficial than caching a page for a slightly less frequent but scan-heavy query. As a result, existing caching policies waste valuable memory space to cache input data that offer little-to-no acceleration for analytics. This paper proposes HPCache, a buffer management policy that enables fast analytics on high-bandwidth storage by efficiently using the available in-memory space. HPCache caches data based on their speedup potential instead of relying on frequency-based statistics. We show that, with fast storage, the benefit of in-memory caching varies significantly across queries; therefore, we quantify the efficiency of caching decisions and formulate an optimization problem. We implement HPCache in Proteus and show that i) estimating speedup potential improves memory space utilization, and ii) simple runtime statistics suffice to infer speedup expectations. We show that HPCache achieves up to 12% faster query execution over state-of-the-art caching policies, or 75% less in-memory cache footprint without deteriorating query performance. Overall, HPCache enables efficient use of the in-memory space for input caching in the presence of fast storage, without any requirement for workload predictions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 18th International Workshop on Data Management on New Hardware

自引率

0.00%

发文量