{"title":"基于连续时间确定性策略梯度的变形飞行器无探索控制器*","authors":"Seong-hun Kim, Hanna Lee, Youdan Kim","doi":"10.1109/MED48518.2020.9183206","DOIUrl":null,"url":null,"abstract":"A controller is optimized using limited flight data of morphing aircraft without dynamic model. Due to the nonlinear and nonaffine in control nature of the morphing aircraft dynamics, the integral reinforcement learning scheme and the deterministic policy gradient-based learning method are incorporated to develop to train the parameterized control input. The stability of the learnt control input is analyzed when the corresponding action-value function is converged. Unlike online algorithms, the parameters in the control input is hard to converge, and therefore a constrained learning strategy is implemented. The performance of the proposed method is demonstrated through numerical simulation for a longitudinal morphing aircraft system.","PeriodicalId":418518,"journal":{"name":"2020 28th Mediterranean Conference on Control and Automation (MED)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Continuous-Time Deterministic Policy Gradient-Based Controller for Morphing Aircraft without Exploration*\",\"authors\":\"Seong-hun Kim, Hanna Lee, Youdan Kim\",\"doi\":\"10.1109/MED48518.2020.9183206\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A controller is optimized using limited flight data of morphing aircraft without dynamic model. Due to the nonlinear and nonaffine in control nature of the morphing aircraft dynamics, the integral reinforcement learning scheme and the deterministic policy gradient-based learning method are incorporated to develop to train the parameterized control input. The stability of the learnt control input is analyzed when the corresponding action-value function is converged. Unlike online algorithms, the parameters in the control input is hard to converge, and therefore a constrained learning strategy is implemented. The performance of the proposed method is demonstrated through numerical simulation for a longitudinal morphing aircraft system.\",\"PeriodicalId\":418518,\"journal\":{\"name\":\"2020 28th Mediterranean Conference on Control and Automation (MED)\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 28th Mediterranean Conference on Control and Automation (MED)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MED48518.2020.9183206\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 28th Mediterranean Conference on Control and Automation (MED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MED48518.2020.9183206","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Continuous-Time Deterministic Policy Gradient-Based Controller for Morphing Aircraft without Exploration*
A controller is optimized using limited flight data of morphing aircraft without dynamic model. Due to the nonlinear and nonaffine in control nature of the morphing aircraft dynamics, the integral reinforcement learning scheme and the deterministic policy gradient-based learning method are incorporated to develop to train the parameterized control input. The stability of the learnt control input is analyzed when the corresponding action-value function is converged. Unlike online algorithms, the parameters in the control input is hard to converge, and therefore a constrained learning strategy is implemented. The performance of the proposed method is demonstrated through numerical simulation for a longitudinal morphing aircraft system.