Sayer: Using Implicit Feedback to Optimize System Policies

Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference) Pub Date : 2021-10-28 DOI:10.1145/3472883.3487001

Mathias Lécuyer, Sang Hoon Kim, Mihir Nanavati, Junchen Jiang, S. Sen, Aleksandrs Slivkins, Amit Sharma

{"title":"Sayer: Using Implicit Feedback to Optimize System Policies","authors":"Mathias Lécuyer, Sang Hoon Kim, Mihir Nanavati, Junchen Jiang, S. Sen, Aleksandrs Slivkins, Amit Sharma","doi":"10.1145/3472883.3487001","DOIUrl":null,"url":null,"abstract":"We observe that many system policies that make threshold decisions involving a resource (e.g., time, memory, cores) naturally reveal additional, or implicit feedback. For example, if a system waits X min for an event to occur, then it automatically learns what would have happened if it waited < X min, because time has a cumulative property. This feedback tells us about alternative decisions, and can be used to improve the system policy. However, leveraging implicit feedback is difficult because it tends to be one-sided or incomplete, and may depend on the outcome of the event. As a result, existing practices for using feedback, such as simply incorporating it into a data-driven model, suffer from bias. We develop a methodology, called Sayer, that leverages implicit feedback to evaluate and train new system policies. Sayer builds on two ideas from reinforcement learning---randomized exploration and unbiased counterfactual estimators---to leverage data collected by an existing policy to estimate the performance of new candidate policies, without actually deploying those policies. Sayer uses implicit exploration and implicit data augmentation to generate implicit feedback in an unbiased form, which is then used by an implicit counterfactual estimator to evaluate and train new policies. The key idea underlying these techniques is to assign implicit probabilities to decisions that are not actually taken but whose feedback can be inferred; these probabilities are carefully calculated to ensure statistical unbiasedness. We apply Sayer to two production scenarios in Azure, and show that it can evaluate arbitrary policies accurately, and train new policies that outperform the production policies.","PeriodicalId":91949,"journal":{"name":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","volume":"124 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3472883.3487001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

We observe that many system policies that make threshold decisions involving a resource (e.g., time, memory, cores) naturally reveal additional, or implicit feedback. For example, if a system waits X min for an event to occur, then it automatically learns what would have happened if it waited < X min, because time has a cumulative property. This feedback tells us about alternative decisions, and can be used to improve the system policy. However, leveraging implicit feedback is difficult because it tends to be one-sided or incomplete, and may depend on the outcome of the event. As a result, existing practices for using feedback, such as simply incorporating it into a data-driven model, suffer from bias. We develop a methodology, called Sayer, that leverages implicit feedback to evaluate and train new system policies. Sayer builds on two ideas from reinforcement learning---randomized exploration and unbiased counterfactual estimators---to leverage data collected by an existing policy to estimate the performance of new candidate policies, without actually deploying those policies. Sayer uses implicit exploration and implicit data augmentation to generate implicit feedback in an unbiased form, which is then used by an implicit counterfactual estimator to evaluate and train new policies. The key idea underlying these techniques is to assign implicit probabilities to decisions that are not actually taken but whose feedback can be inferred; these probabilities are carefully calculated to ensure statistical unbiasedness. We apply Sayer to two production scenarios in Azure, and show that it can evaluate arbitrary policies accurately, and train new policies that outperform the production policies.

查看原文本刊更多论文

使用隐式反馈优化系统策略

我们观察到，许多做出涉及资源(例如，时间、内存、内核)的阈值决策的系统策略自然地揭示了额外的或隐含的反馈。例如，如果系统等待事件发生X分钟，那么它会自动学习如果等待时间小于X分钟会发生什么，因为时间具有累积属性。这个反馈告诉我们关于可选择的决策，并且可以用来改进系统策略。然而，利用隐性反馈是困难的，因为它往往是片面的或不完整的，并且可能取决于事件的结果。因此，现有的使用反馈的做法，比如简单地将其纳入数据驱动的模型，存在偏见。我们开发了一种叫做Sayer的方法，它利用隐式反馈来评估和培训新的系统策略。Sayer基于强化学习的两个想法——随机探索和无偏反事实估计——利用现有策略收集的数据来估计新的候选策略的性能，而无需实际部署这些策略。Sayer使用隐式探索和隐式数据增强来生成无偏形式的隐式反馈，然后由隐式反事实估计器使用该反馈来评估和训练新策略。这些技术背后的关键思想是，将隐含概率分配给那些没有实际采取但其反馈可以推断出来的决策;这些概率是经过仔细计算的，以确保统计的无偏性。我们将Sayer应用于Azure中的两个生产场景，并表明它可以准确地评估任意策略，并训练优于生产策略的新策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)

自引率

0.00%

发文量