{"title":"用于机器人运动规划的零点泛化联合强化学习","authors":"Zhenyuan Yuan, Siyuan Xu, Minghui Zhu","doi":"arxiv-2403.13245","DOIUrl":null,"url":null,"abstract":"This paper considers the problem of learning a control policy for robot\nmotion planning with zero-shot generalization, i.e., no data collection and\npolicy adaptation is needed when the learned policy is deployed in new\nenvironments. We develop a federated reinforcement learning framework that\nenables collaborative learning of multiple learners and a central server, i.e.,\nthe Cloud, without sharing their raw data. In each iteration, each learner\nuploads its local control policy and the corresponding estimated normalized\narrival time to the Cloud, which then computes the global optimum among the\nlearners and broadcasts the optimal policy to the learners. Each learner then\nselects between its local control policy and that from the Cloud for next\niteration. The proposed framework leverages on the derived zero-shot\ngeneralization guarantees on arrival time and safety. Theoretical guarantees on\nalmost-sure convergence, almost consensus, Pareto improvement and optimality\ngap are also provided. Monte Carlo simulation is conducted to evaluate the\nproposed framework.","PeriodicalId":501062,"journal":{"name":"arXiv - CS - Systems and Control","volume":"160 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Federated reinforcement learning for robot motion planning with zero-shot generalization\",\"authors\":\"Zhenyuan Yuan, Siyuan Xu, Minghui Zhu\",\"doi\":\"arxiv-2403.13245\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper considers the problem of learning a control policy for robot\\nmotion planning with zero-shot generalization, i.e., no data collection and\\npolicy adaptation is needed when the learned policy is deployed in new\\nenvironments. We develop a federated reinforcement learning framework that\\nenables collaborative learning of multiple learners and a central server, i.e.,\\nthe Cloud, without sharing their raw data. In each iteration, each learner\\nuploads its local control policy and the corresponding estimated normalized\\narrival time to the Cloud, which then computes the global optimum among the\\nlearners and broadcasts the optimal policy to the learners. Each learner then\\nselects between its local control policy and that from the Cloud for next\\niteration. The proposed framework leverages on the derived zero-shot\\ngeneralization guarantees on arrival time and safety. Theoretical guarantees on\\nalmost-sure convergence, almost consensus, Pareto improvement and optimality\\ngap are also provided. Monte Carlo simulation is conducted to evaluate the\\nproposed framework.\",\"PeriodicalId\":501062,\"journal\":{\"name\":\"arXiv - CS - Systems and Control\",\"volume\":\"160 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Systems and Control\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2403.13245\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Systems and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2403.13245","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Federated reinforcement learning for robot motion planning with zero-shot generalization
This paper considers the problem of learning a control policy for robot
motion planning with zero-shot generalization, i.e., no data collection and
policy adaptation is needed when the learned policy is deployed in new
environments. We develop a federated reinforcement learning framework that
enables collaborative learning of multiple learners and a central server, i.e.,
the Cloud, without sharing their raw data. In each iteration, each learner
uploads its local control policy and the corresponding estimated normalized
arrival time to the Cloud, which then computes the global optimum among the
learners and broadcasts the optimal policy to the learners. Each learner then
selects between its local control policy and that from the Cloud for next
iteration. The proposed framework leverages on the derived zero-shot
generalization guarantees on arrival time and safety. Theoretical guarantees on
almost-sure convergence, almost consensus, Pareto improvement and optimality
gap are also provided. Monte Carlo simulation is conducted to evaluate the
proposed framework.