{"title":"平均场控制问题的连续时间 q 学习","authors":"Xiaoli Wei, Xiang Yu","doi":"10.1007/s00245-024-10205-7","DOIUrl":null,"url":null,"abstract":"<div><p>This paper studies the q-learning, recently coined as the continuous time counterpart of Q-learning by Jia and Zhou (J Mach Learn Res 24:1–61, 2023), for continuous time mean-field control problems in the setting of entropy-regularized reinforcement learning. In contrast to the single agent’s control problem in Jia and Zhou (J Mach Learn Res 24:1–61, 2023), we reveal that two different q-functions naturally arise in mean-field control problems: (i) the integrated q-function (denoted by <i>q</i>) as the first-order approximation of the integrated Q-function introduced in Gu et al. (Oper Res 71(4):1040–1054, 2023), which can be learnt by a weak martingale condition using all test policies; and (ii) the essential q-function (denoted by <span>\\(q_e\\)</span>) that is employed in the policy improvement iterations. We show that two q-functions are related via an integral representation. Based on the weak martingale condition and our proposed searching method of test policies, some model-free learning algorithms are devised. In two examples, one in LQ control framework and one beyond LQ control framework, we can obtain the exact parameterization of the optimal value function and q-functions and illustrate our algorithms with simulation experiments.\n</p></div>","PeriodicalId":55566,"journal":{"name":"Applied Mathematics and Optimization","volume":"91 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Continuous Time q-Learning for Mean-Field Control Problems\",\"authors\":\"Xiaoli Wei, Xiang Yu\",\"doi\":\"10.1007/s00245-024-10205-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>This paper studies the q-learning, recently coined as the continuous time counterpart of Q-learning by Jia and Zhou (J Mach Learn Res 24:1–61, 2023), for continuous time mean-field control problems in the setting of entropy-regularized reinforcement learning. In contrast to the single agent’s control problem in Jia and Zhou (J Mach Learn Res 24:1–61, 2023), we reveal that two different q-functions naturally arise in mean-field control problems: (i) the integrated q-function (denoted by <i>q</i>) as the first-order approximation of the integrated Q-function introduced in Gu et al. (Oper Res 71(4):1040–1054, 2023), which can be learnt by a weak martingale condition using all test policies; and (ii) the essential q-function (denoted by <span>\\\\(q_e\\\\)</span>) that is employed in the policy improvement iterations. We show that two q-functions are related via an integral representation. Based on the weak martingale condition and our proposed searching method of test policies, some model-free learning algorithms are devised. In two examples, one in LQ control framework and one beyond LQ control framework, we can obtain the exact parameterization of the optimal value function and q-functions and illustrate our algorithms with simulation experiments.\\n</p></div>\",\"PeriodicalId\":55566,\"journal\":{\"name\":\"Applied Mathematics and Optimization\",\"volume\":\"91 1\",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2024-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Mathematics and Optimization\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s00245-024-10205-7\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICS, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Mathematics and Optimization","FirstCategoryId":"100","ListUrlMain":"https://link.springer.com/article/10.1007/s00245-024-10205-7","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
Continuous Time q-Learning for Mean-Field Control Problems
This paper studies the q-learning, recently coined as the continuous time counterpart of Q-learning by Jia and Zhou (J Mach Learn Res 24:1–61, 2023), for continuous time mean-field control problems in the setting of entropy-regularized reinforcement learning. In contrast to the single agent’s control problem in Jia and Zhou (J Mach Learn Res 24:1–61, 2023), we reveal that two different q-functions naturally arise in mean-field control problems: (i) the integrated q-function (denoted by q) as the first-order approximation of the integrated Q-function introduced in Gu et al. (Oper Res 71(4):1040–1054, 2023), which can be learnt by a weak martingale condition using all test policies; and (ii) the essential q-function (denoted by \(q_e\)) that is employed in the policy improvement iterations. We show that two q-functions are related via an integral representation. Based on the weak martingale condition and our proposed searching method of test policies, some model-free learning algorithms are devised. In two examples, one in LQ control framework and one beyond LQ control framework, we can obtain the exact parameterization of the optimal value function and q-functions and illustrate our algorithms with simulation experiments.
期刊介绍:
The Applied Mathematics and Optimization Journal covers a broad range of mathematical methods in particular those that bridge with optimization and have some connection with applications. Core topics include calculus of variations, partial differential equations, stochastic control, optimization of deterministic or stochastic systems in discrete or continuous time, homogenization, control theory, mean field games, dynamic games and optimal transport. Algorithmic, data analytic, machine learning and numerical methods which support the modeling and analysis of optimization problems are encouraged. Of great interest are papers which show some novel idea in either the theory or model which include some connection with potential applications in science and engineering.