{"title":"基于基权向量的动态学习与决策","authors":"Hao Zhang","doi":"10.1287/opre.2021.2240","DOIUrl":null,"url":null,"abstract":"A New Method for Dynamic Learning and Doing For a large class of learning-and-doing problems, two processes are intertwined in the analysis: a forward process that updates the decision maker’s belief or estimate of the unknown parameter, and a backward process that computes the expected future values. The mainstream literature focuses on the former process. In contrast, in “Dynamic Learning and Decision Making via Basis Weight Vectors,” Hao Zhang proposes a new method based on pure backward induction on the continuation values created by feasible continuation policies. When the unknown parameter is a continuous variable, the method represents each continuation-value function by a vector of weights placed on a set of basis functions. The weight vectors that are potentially useful for the optimal solution can be found backward in time exactly (for very small problems) or approximately (for larger problems). A simulation study demonstrates that an approximation algorithm based on this method outperforms some popular algorithms in the linear contextual bandit literature when the learning horizon is short.","PeriodicalId":19546,"journal":{"name":"Oper. Res.","volume":"119 1","pages":"1835-1853"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Dynamic Learning and Decision Making via Basis Weight Vectors\",\"authors\":\"Hao Zhang\",\"doi\":\"10.1287/opre.2021.2240\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A New Method for Dynamic Learning and Doing For a large class of learning-and-doing problems, two processes are intertwined in the analysis: a forward process that updates the decision maker’s belief or estimate of the unknown parameter, and a backward process that computes the expected future values. The mainstream literature focuses on the former process. In contrast, in “Dynamic Learning and Decision Making via Basis Weight Vectors,” Hao Zhang proposes a new method based on pure backward induction on the continuation values created by feasible continuation policies. When the unknown parameter is a continuous variable, the method represents each continuation-value function by a vector of weights placed on a set of basis functions. The weight vectors that are potentially useful for the optimal solution can be found backward in time exactly (for very small problems) or approximately (for larger problems). A simulation study demonstrates that an approximation algorithm based on this method outperforms some popular algorithms in the linear contextual bandit literature when the learning horizon is short.\",\"PeriodicalId\":19546,\"journal\":{\"name\":\"Oper. Res.\",\"volume\":\"119 1\",\"pages\":\"1835-1853\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Oper. Res.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1287/opre.2021.2240\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Oper. Res.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1287/opre.2021.2240","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Dynamic Learning and Decision Making via Basis Weight Vectors
A New Method for Dynamic Learning and Doing For a large class of learning-and-doing problems, two processes are intertwined in the analysis: a forward process that updates the decision maker’s belief or estimate of the unknown parameter, and a backward process that computes the expected future values. The mainstream literature focuses on the former process. In contrast, in “Dynamic Learning and Decision Making via Basis Weight Vectors,” Hao Zhang proposes a new method based on pure backward induction on the continuation values created by feasible continuation policies. When the unknown parameter is a continuous variable, the method represents each continuation-value function by a vector of weights placed on a set of basis functions. The weight vectors that are potentially useful for the optimal solution can be found backward in time exactly (for very small problems) or approximately (for larger problems). A simulation study demonstrates that an approximation algorithm based on this method outperforms some popular algorithms in the linear contextual bandit literature when the learning horizon is short.