Adapting the Secretary Hiring Problem for Optimal Hot-Cold Tier Placement Under Top-K Workloads

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2019-01-22 DOI:10.1109/CCGRID.2019.00074

Ben Blamey, Fredrik Wrede, Johan Karlsson, A. Hellander, S. Toor

{"title":"Adapting the Secretary Hiring Problem for Optimal Hot-Cold Tier Placement Under Top-K Workloads","authors":"Ben Blamey, Fredrik Wrede, Johan Karlsson, A. Hellander, S. Toor","doi":"10.1109/CCGRID.2019.00074","DOIUrl":null,"url":null,"abstract":"Top-K queries are an established heuristic in information retrieval. This paper presents an approach for optimal tiered storage allocation under stream processing workloads using this heuristic: those requiring the analysis of only the top-K ranked most relevant documents from a fixed-length stream, stream window, or batch job. Documents are ranked for relevance on a user-specified interestingness function, the top-K stored for further processing. This scenario bears similarity to the classic Secretary Hiring Problem (SHP), and the expected rate of document writes and document lifetime can be modelled as a function of document index. We present parameter-based algorithms for storage tier placement, minimizing document storage and transport costs. We derive expressions for optimal parameter values in terms of tier storage and transport costs a priori, without needing to monitor the application. This contrasts with (often complex) existing work on tiered storage optimization, which is either tightly coupled to specific use cases, or requires active monitoring of application IO load – ill-suited to long-running or one-off operations common in the scientific computing domain. We motivate and evaluate our model with a trace-driven simulation of human-in-the-loop bio-chemical model exploration, and two cloud storage case studies.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2019.00074","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Top-K queries are an established heuristic in information retrieval. This paper presents an approach for optimal tiered storage allocation under stream processing workloads using this heuristic: those requiring the analysis of only the top-K ranked most relevant documents from a fixed-length stream, stream window, or batch job. Documents are ranked for relevance on a user-specified interestingness function, the top-K stored for further processing. This scenario bears similarity to the classic Secretary Hiring Problem (SHP), and the expected rate of document writes and document lifetime can be modelled as a function of document index. We present parameter-based algorithms for storage tier placement, minimizing document storage and transport costs. We derive expressions for optimal parameter values in terms of tier storage and transport costs a priori, without needing to monitor the application. This contrasts with (often complex) existing work on tiered storage optimization, which is either tightly coupled to specific use cases, or requires active monitoring of application IO load – ill-suited to long-running or one-off operations common in the scientific computing domain. We motivate and evaluate our model with a trace-driven simulation of human-in-the-loop bio-chemical model exploration, and two cloud storage case studies.

查看原文本刊更多论文

Top-K工作量下秘书招聘问题的优化冷热层配置

Top-K查询是信息检索中常用的启发式查询方法。本文提出了一种在流处理工作负载下使用这种启发式方法进行最佳分层存储分配的方法:那些只需要分析固定长度流、流窗口或批处理作业中排名前k位的最相关文档的方法。根据用户指定的兴趣函数对文档进行相关性排序，存储前k以供进一步处理。此场景与经典的秘书招聘问题(SHP)相似，并且可以将文档写入的预期速率和文档生命周期建模为文档索引的函数。我们提出了基于参数的存储层放置算法，最小化文档存储和传输成本。我们根据层存储和传输成本先验地推导出最优参数值的表达式，而无需监控应用程序。这与现有的分层存储优化工作(通常很复杂)形成了对比，后者要么与特定用例紧密耦合，要么需要主动监控应用程序IO负载——不适合科学计算领域中常见的长时间运行或一次性操作。我们通过跟踪驱动的人类在环生物化学模型探索模拟和两个云存储案例研究来激励和评估我们的模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

自引率

0.00%

发文量