3 Years, 2 Papers, 1 Course Off: Optimal Non-Monetary Reward Policies

Wei Chen, Shivam Gupta, Milind Dawande, G. Janakiraman
{"title":"3 Years, 2 Papers, 1 Course Off: Optimal Non-Monetary Reward Policies","authors":"Wei Chen, Shivam Gupta, Milind Dawande, G. Janakiraman","doi":"10.2139/ssrn.3647569","DOIUrl":null,"url":null,"abstract":"We consider a principal who periodically offers a fixed, binary, and costly non-monetary reward to agents endowed with private information, to incentivize the agents to invest effort over the long run. An agent's output, as a function of his effort, is a priori uncertain and is worth a fixed per-unit value to the principal. The principal's goal is to design an attractive reward policy that specifies how the rewards are to be given to an agent over time, based on that agent's past performance. This problem, which we denote by P, is motivated by practical examples from both academia (a reduced teaching load for achieving a certain research-productivity threshold) and industry (\"Supplier of the Year\" awards in recognition of excellent past performance). The following \"limited-term'' reward policy structure has been quite popular in practice: The principal evaluates each agent periodically; if an agent's performance over a certain (limited) number of periods in the immediate past exceeds a pre-defined threshold, then the principal rewards him for a certain (limited) number of periods in the immediate future. For the deterministic special case of problem P, where there is no uncertainty in any agent's output given his effort, we show that there always exists an optimal policy that is a limited-term policy and also obtain such a policy. When agents' outputs are stochastic, we show that the class of limited-term policies may not contain any optimal policy of problem P but is guaranteed to contain policies that are arbitrarily near-optimal: Given any epsilon>0, we show how to obtain a limited-term policy whose performance is within epsilon of that of an optimal policy. This guarantee depends crucially on the use of sufficiently long histories of the agents' outputs for the determination of the rewards. In situations where access to this historical information is limited, we derive structural insights on the role played by (i) the length of the available history and (ii) the variability in the random variable governing an agent's output, on the performance of this class of policies. Finally, we introduce and analyze the class of \"score-based'' reward policies - we show that this class is guaranteed to contain an optimal policy and also obtain such a policy.","PeriodicalId":119201,"journal":{"name":"Microeconomics: Asymmetric & Private Information eJournal","volume":"281 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microeconomics: Asymmetric & Private Information eJournal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3647569","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We consider a principal who periodically offers a fixed, binary, and costly non-monetary reward to agents endowed with private information, to incentivize the agents to invest effort over the long run. An agent's output, as a function of his effort, is a priori uncertain and is worth a fixed per-unit value to the principal. The principal's goal is to design an attractive reward policy that specifies how the rewards are to be given to an agent over time, based on that agent's past performance. This problem, which we denote by P, is motivated by practical examples from both academia (a reduced teaching load for achieving a certain research-productivity threshold) and industry ("Supplier of the Year" awards in recognition of excellent past performance). The following "limited-term'' reward policy structure has been quite popular in practice: The principal evaluates each agent periodically; if an agent's performance over a certain (limited) number of periods in the immediate past exceeds a pre-defined threshold, then the principal rewards him for a certain (limited) number of periods in the immediate future. For the deterministic special case of problem P, where there is no uncertainty in any agent's output given his effort, we show that there always exists an optimal policy that is a limited-term policy and also obtain such a policy. When agents' outputs are stochastic, we show that the class of limited-term policies may not contain any optimal policy of problem P but is guaranteed to contain policies that are arbitrarily near-optimal: Given any epsilon>0, we show how to obtain a limited-term policy whose performance is within epsilon of that of an optimal policy. This guarantee depends crucially on the use of sufficiently long histories of the agents' outputs for the determination of the rewards. In situations where access to this historical information is limited, we derive structural insights on the role played by (i) the length of the available history and (ii) the variability in the random variable governing an agent's output, on the performance of this class of policies. Finally, we introduce and analyze the class of "score-based'' reward policies - we show that this class is guaranteed to contain an optimal policy and also obtain such a policy.
3年,2篇论文,1门课程:最优非货币奖励政策
我们考虑一个委托人,他定期向拥有私人信息的代理人提供固定的、二元的、昂贵的非货币性奖励,以激励代理人在长期内投入努力。作为其努力的函数,代理的输出是先验的不确定的,并且对委托人的单位价值是固定的。委托人的目标是设计一个有吸引力的奖励政策,根据代理人过去的表现,指定随着时间的推移如何给予奖励。这个问题,我们用P表示,是由来自学术界(减少教学负担以达到一定的研究生产力门槛)和工业界(“年度供应商”奖,以表彰过去的出色表现)的实际例子所激发的。实践中普遍采用的“有限期限”奖励政策结构是:委托人定期对代理人进行评估;如果代理人在过去一定(有限)时间内的表现超过了预先定义的阈值,那么委托人就会在不久的将来一定(有限)时间内奖励他。对于问题P的确定性特例,在给定智能体努力的情况下,任何智能体的输出都不存在不确定性,我们证明了总是存在一个最优策略,该策略是有限期限策略,并且也得到了这样一个策略。当智能体的输出是随机的时,我们证明了有限期策略类可能不包含问题P的任何最优策略,但保证包含任意接近最优的策略:给定任意epsilon>0,我们展示了如何获得性能在最优策略的epsilon内的有限期策略。这种保证关键依赖于使用足够长的代理输出历史来确定奖励。在对历史信息的访问受到限制的情况下,我们对(i)可用历史的长度和(ii)控制代理输出的随机变量的可变性对这类策略的性能所起的作用得出了结构性的见解。最后,我们引入并分析了一类“基于分数”的奖励策略,我们证明了该类保证包含一个最优策略,并且也获得了这样一个最优策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信