{"title":"用局部处理对马尔可夫决策过程进行实验","authors":"Shuze Chen, David Simchi-Levi, Chonghuan Wang","doi":"arxiv-2407.19618","DOIUrl":null,"url":null,"abstract":"As service systems grow increasingly complex and dynamic, many interventions\nbecome localized, available and taking effect only in specific states. This\npaper investigates experiments with local treatments on a widely-used class of\ndynamic models, Markov Decision Processes (MDPs). Particularly, we focus on\nutilizing the local structure to improve the inference efficiency of the\naverage treatment effect. We begin by demonstrating the efficiency of classical\ninference methods, including model-based estimation and temporal difference\nlearning under a fixed policy, as well as classical A/B testing with general\ntreatments. We then introduce a variance reduction technique that exploits the\nlocal treatment structure by sharing information for states unaffected by the\ntreatment policy. Our new estimator effectively overcomes the variance lower\nbound for general treatments while matching the more stringent lower bound\nincorporating the local treatment structure. Furthermore, our estimator can\noptimally achieve a linear reduction with the number of test arms for a major\npart of the variance. Finally, we explore scenarios with perfect knowledge of\nthe control arm and design estimators that further improve inference\nefficiency.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"73 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Experimenting on Markov Decision Processes with Local Treatments\",\"authors\":\"Shuze Chen, David Simchi-Levi, Chonghuan Wang\",\"doi\":\"arxiv-2407.19618\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As service systems grow increasingly complex and dynamic, many interventions\\nbecome localized, available and taking effect only in specific states. This\\npaper investigates experiments with local treatments on a widely-used class of\\ndynamic models, Markov Decision Processes (MDPs). Particularly, we focus on\\nutilizing the local structure to improve the inference efficiency of the\\naverage treatment effect. We begin by demonstrating the efficiency of classical\\ninference methods, including model-based estimation and temporal difference\\nlearning under a fixed policy, as well as classical A/B testing with general\\ntreatments. We then introduce a variance reduction technique that exploits the\\nlocal treatment structure by sharing information for states unaffected by the\\ntreatment policy. Our new estimator effectively overcomes the variance lower\\nbound for general treatments while matching the more stringent lower bound\\nincorporating the local treatment structure. Furthermore, our estimator can\\noptimally achieve a linear reduction with the number of test arms for a major\\npart of the variance. Finally, we explore scenarios with perfect knowledge of\\nthe control arm and design estimators that further improve inference\\nefficiency.\",\"PeriodicalId\":501293,\"journal\":{\"name\":\"arXiv - ECON - Econometrics\",\"volume\":\"73 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - ECON - Econometrics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.19618\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - ECON - Econometrics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.19618","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Experimenting on Markov Decision Processes with Local Treatments
As service systems grow increasingly complex and dynamic, many interventions
become localized, available and taking effect only in specific states. This
paper investigates experiments with local treatments on a widely-used class of
dynamic models, Markov Decision Processes (MDPs). Particularly, we focus on
utilizing the local structure to improve the inference efficiency of the
average treatment effect. We begin by demonstrating the efficiency of classical
inference methods, including model-based estimation and temporal difference
learning under a fixed policy, as well as classical A/B testing with general
treatments. We then introduce a variance reduction technique that exploits the
local treatment structure by sharing information for states unaffected by the
treatment policy. Our new estimator effectively overcomes the variance lower
bound for general treatments while matching the more stringent lower bound
incorporating the local treatment structure. Furthermore, our estimator can
optimally achieve a linear reduction with the number of test arms for a major
part of the variance. Finally, we explore scenarios with perfect knowledge of
the control arm and design estimators that further improve inference
efficiency.