Adapting the Secretary Hiring Problem for Optimal Hot-Cold Tier Placement Under Top-K Workloads

Ben Blamey, Fredrik Wrede, Johan Karlsson, A. Hellander, S. Toor
{"title":"Adapting the Secretary Hiring Problem for Optimal Hot-Cold Tier Placement Under Top-K Workloads","authors":"Ben Blamey, Fredrik Wrede, Johan Karlsson, A. Hellander, S. Toor","doi":"10.1109/CCGRID.2019.00074","DOIUrl":null,"url":null,"abstract":"Top-K queries are an established heuristic in information retrieval. This paper presents an approach for optimal tiered storage allocation under stream processing workloads using this heuristic: those requiring the analysis of only the top-K ranked most relevant documents from a fixed-length stream, stream window, or batch job. Documents are ranked for relevance on a user-specified interestingness function, the top-K stored for further processing. This scenario bears similarity to the classic Secretary Hiring Problem (SHP), and the expected rate of document writes and document lifetime can be modelled as a function of document index. We present parameter-based algorithms for storage tier placement, minimizing document storage and transport costs. We derive expressions for optimal parameter values in terms of tier storage and transport costs a priori, without needing to monitor the application. This contrasts with (often complex) existing work on tiered storage optimization, which is either tightly coupled to specific use cases, or requires active monitoring of application IO load – ill-suited to long-running or one-off operations common in the scientific computing domain. We motivate and evaluate our model with a trace-driven simulation of human-in-the-loop bio-chemical model exploration, and two cloud storage case studies.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2019.00074","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Top-K queries are an established heuristic in information retrieval. This paper presents an approach for optimal tiered storage allocation under stream processing workloads using this heuristic: those requiring the analysis of only the top-K ranked most relevant documents from a fixed-length stream, stream window, or batch job. Documents are ranked for relevance on a user-specified interestingness function, the top-K stored for further processing. This scenario bears similarity to the classic Secretary Hiring Problem (SHP), and the expected rate of document writes and document lifetime can be modelled as a function of document index. We present parameter-based algorithms for storage tier placement, minimizing document storage and transport costs. We derive expressions for optimal parameter values in terms of tier storage and transport costs a priori, without needing to monitor the application. This contrasts with (often complex) existing work on tiered storage optimization, which is either tightly coupled to specific use cases, or requires active monitoring of application IO load – ill-suited to long-running or one-off operations common in the scientific computing domain. We motivate and evaluate our model with a trace-driven simulation of human-in-the-loop bio-chemical model exploration, and two cloud storage case studies.
Top-K工作量下秘书招聘问题的优化冷热层配置
Top-K查询是信息检索中常用的启发式查询方法。本文提出了一种在流处理工作负载下使用这种启发式方法进行最佳分层存储分配的方法:那些只需要分析固定长度流、流窗口或批处理作业中排名前k位的最相关文档的方法。根据用户指定的兴趣函数对文档进行相关性排序,存储前k以供进一步处理。此场景与经典的秘书招聘问题(SHP)相似,并且可以将文档写入的预期速率和文档生命周期建模为文档索引的函数。我们提出了基于参数的存储层放置算法,最小化文档存储和传输成本。我们根据层存储和传输成本先验地推导出最优参数值的表达式,而无需监控应用程序。这与现有的分层存储优化工作(通常很复杂)形成了对比,后者要么与特定用例紧密耦合,要么需要主动监控应用程序IO负载——不适合科学计算领域中常见的长时间运行或一次性操作。我们通过跟踪驱动的人类在环生物化学模型探索模拟和两个云存储案例研究来激励和评估我们的模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信