Ben Blamey, Fredrik Wrede, Johan Karlsson, A. Hellander, S. Toor
{"title":"Top-K工作量下秘书招聘问题的优化冷热层配置","authors":"Ben Blamey, Fredrik Wrede, Johan Karlsson, A. Hellander, S. Toor","doi":"10.1109/CCGRID.2019.00074","DOIUrl":null,"url":null,"abstract":"Top-K queries are an established heuristic in information retrieval. This paper presents an approach for optimal tiered storage allocation under stream processing workloads using this heuristic: those requiring the analysis of only the top-K ranked most relevant documents from a fixed-length stream, stream window, or batch job. Documents are ranked for relevance on a user-specified interestingness function, the top-K stored for further processing. This scenario bears similarity to the classic Secretary Hiring Problem (SHP), and the expected rate of document writes and document lifetime can be modelled as a function of document index. We present parameter-based algorithms for storage tier placement, minimizing document storage and transport costs. We derive expressions for optimal parameter values in terms of tier storage and transport costs a priori, without needing to monitor the application. This contrasts with (often complex) existing work on tiered storage optimization, which is either tightly coupled to specific use cases, or requires active monitoring of application IO load – ill-suited to long-running or one-off operations common in the scientific computing domain. We motivate and evaluate our model with a trace-driven simulation of human-in-the-loop bio-chemical model exploration, and two cloud storage case studies.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Adapting the Secretary Hiring Problem for Optimal Hot-Cold Tier Placement Under Top-K Workloads\",\"authors\":\"Ben Blamey, Fredrik Wrede, Johan Karlsson, A. Hellander, S. Toor\",\"doi\":\"10.1109/CCGRID.2019.00074\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Top-K queries are an established heuristic in information retrieval. This paper presents an approach for optimal tiered storage allocation under stream processing workloads using this heuristic: those requiring the analysis of only the top-K ranked most relevant documents from a fixed-length stream, stream window, or batch job. Documents are ranked for relevance on a user-specified interestingness function, the top-K stored for further processing. This scenario bears similarity to the classic Secretary Hiring Problem (SHP), and the expected rate of document writes and document lifetime can be modelled as a function of document index. We present parameter-based algorithms for storage tier placement, minimizing document storage and transport costs. We derive expressions for optimal parameter values in terms of tier storage and transport costs a priori, without needing to monitor the application. This contrasts with (often complex) existing work on tiered storage optimization, which is either tightly coupled to specific use cases, or requires active monitoring of application IO load – ill-suited to long-running or one-off operations common in the scientific computing domain. We motivate and evaluate our model with a trace-driven simulation of human-in-the-loop bio-chemical model exploration, and two cloud storage case studies.\",\"PeriodicalId\":234571,\"journal\":{\"name\":\"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-01-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGRID.2019.00074\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2019.00074","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Adapting the Secretary Hiring Problem for Optimal Hot-Cold Tier Placement Under Top-K Workloads
Top-K queries are an established heuristic in information retrieval. This paper presents an approach for optimal tiered storage allocation under stream processing workloads using this heuristic: those requiring the analysis of only the top-K ranked most relevant documents from a fixed-length stream, stream window, or batch job. Documents are ranked for relevance on a user-specified interestingness function, the top-K stored for further processing. This scenario bears similarity to the classic Secretary Hiring Problem (SHP), and the expected rate of document writes and document lifetime can be modelled as a function of document index. We present parameter-based algorithms for storage tier placement, minimizing document storage and transport costs. We derive expressions for optimal parameter values in terms of tier storage and transport costs a priori, without needing to monitor the application. This contrasts with (often complex) existing work on tiered storage optimization, which is either tightly coupled to specific use cases, or requires active monitoring of application IO load – ill-suited to long-running or one-off operations common in the scientific computing domain. We motivate and evaluate our model with a trace-driven simulation of human-in-the-loop bio-chemical model exploration, and two cloud storage case studies.