用局部处理对马尔可夫决策过程进行实验

arXiv - ECON - Econometrics Pub Date : 2024-07-29 DOI:arxiv-2407.19618

Shuze Chen, David Simchi-Levi, Chonghuan Wang

{"title":"用局部处理对马尔可夫决策过程进行实验","authors":"Shuze Chen, David Simchi-Levi, Chonghuan Wang","doi":"arxiv-2407.19618","DOIUrl":null,"url":null,"abstract":"As service systems grow increasingly complex and dynamic, many interventions\nbecome localized, available and taking effect only in specific states. This\npaper investigates experiments with local treatments on a widely-used class of\ndynamic models, Markov Decision Processes (MDPs). Particularly, we focus on\nutilizing the local structure to improve the inference efficiency of the\naverage treatment effect. We begin by demonstrating the efficiency of classical\ninference methods, including model-based estimation and temporal difference\nlearning under a fixed policy, as well as classical A/B testing with general\ntreatments. We then introduce a variance reduction technique that exploits the\nlocal treatment structure by sharing information for states unaffected by the\ntreatment policy. Our new estimator effectively overcomes the variance lower\nbound for general treatments while matching the more stringent lower bound\nincorporating the local treatment structure. Furthermore, our estimator can\noptimally achieve a linear reduction with the number of test arms for a major\npart of the variance. Finally, we explore scenarios with perfect knowledge of\nthe control arm and design estimators that further improve inference\nefficiency.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"73 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Experimenting on Markov Decision Processes with Local Treatments\",\"authors\":\"Shuze Chen, David Simchi-Levi, Chonghuan Wang\",\"doi\":\"arxiv-2407.19618\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As service systems grow increasingly complex and dynamic, many interventions\\nbecome localized, available and taking effect only in specific states. This\\npaper investigates experiments with local treatments on a widely-used class of\\ndynamic models, Markov Decision Processes (MDPs). Particularly, we focus on\\nutilizing the local structure to improve the inference efficiency of the\\naverage treatment effect. We begin by demonstrating the efficiency of classical\\ninference methods, including model-based estimation and temporal difference\\nlearning under a fixed policy, as well as classical A/B testing with general\\ntreatments. We then introduce a variance reduction technique that exploits the\\nlocal treatment structure by sharing information for states unaffected by the\\ntreatment policy. Our new estimator effectively overcomes the variance lower\\nbound for general treatments while matching the more stringent lower bound\\nincorporating the local treatment structure. Furthermore, our estimator can\\noptimally achieve a linear reduction with the number of test arms for a major\\npart of the variance. Finally, we explore scenarios with perfect knowledge of\\nthe control arm and design estimators that further improve inference\\nefficiency.\",\"PeriodicalId\":501293,\"journal\":{\"name\":\"arXiv - ECON - Econometrics\",\"volume\":\"73 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - ECON - Econometrics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.19618\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - ECON - Econometrics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.19618","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着服务系统变得越来越复杂和动态，许多干预措施也变得局部化，只能在特定状态下使用和生效。本文研究了在一类广泛使用的动态模型--马尔可夫决策过程（Markov Decision Processes，MDPs）--上进行局部治疗的实验。我们尤其关注利用局部结构来提高平均治疗效果的推断效率。我们首先展示了经典推断方法的效率，包括固定策略下基于模型的估计和时差学习，以及使用一般治疗方法的经典 A/B 测试。然后，我们引入了一种方差缩小技术，通过共享不受治疗政策影响的状态信息来利用局部治疗结构。我们的新估计器有效地克服了一般处理方法的方差下限，同时与包含本地处理结构的更严格的下限相匹配。此外，我们的估计器还能以最佳方式实现方差的主要部分与测试臂数量的线性减少。最后，我们探讨了完全了解控制臂的情况，并设计了能进一步提高推断效率的估计器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Experimenting on Markov Decision Processes with Local Treatments

As service systems grow increasingly complex and dynamic, many interventions become localized, available and taking effect only in specific states. This paper investigates experiments with local treatments on a widely-used class of dynamic models, Markov Decision Processes (MDPs). Particularly, we focus on utilizing the local structure to improve the inference efficiency of the average treatment effect. We begin by demonstrating the efficiency of classical inference methods, including model-based estimation and temporal difference learning under a fixed policy, as well as classical A/B testing with general treatments. We then introduce a variance reduction technique that exploits the local treatment structure by sharing information for states unaffected by the treatment policy. Our new estimator effectively overcomes the variance lower bound for general treatments while matching the more stringent lower bound incorporating the local treatment structure. Furthermore, our estimator can optimally achieve a linear reduction with the number of test arms for a major part of the variance. Finally, we explore scenarios with perfect knowledge of the control arm and design estimators that further improve inference efficiency.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - ECON - Econometrics

自引率

0.00%

发文量