协作机器人的模糊Q学习交互控制器设计

Cobot Pub Date : 2022-11-01 DOI:10.12688/cobot.17595.1

Kaichen Ying, Chen chin-yin, Longxiang Wang

{"title":"协作机器人的模糊Q学习交互控制器设计","authors":"Kaichen Ying, Chen chin-yin, Longxiang Wang","doi":"10.12688/cobot.17595.1","DOIUrl":null,"url":null,"abstract":"Background: In physical human-robot interaction (pHRI), admittance control is widely used. The most critical thing in admittance control is the configuration of admittance parameters, but a constant admittance value can not meet the needs of interactive indicators smoothness especially. Variable admittance control is a method to overcome this limitation by adjusting the admittance value in real time. This paper proposes a fuzzy Q-learning (FQL) variable admittance control system, which integrates the fuzzy system (FIS) and reinforcement learning method Q-learning. Methods: FIS is used to turn a continuous input state into fuzzy set and Q-learning is used to train the premise strength of fuzzy rules to get the optimal policy of variable admittance value. To verify the performance of this method, an experiment was performed using an AUBO i5 robot. Training trajectory is point-to-point (PTP) trajectory, several interaction variables before and after training by the algorithm are compared to show the validity of algorithm. Results: Experimental results show that the reward converges to a smaller value in about 25 episodes, and the reward of the last five episodes reduces by 68%. The motion trajectory after algorithm training is closer to the ideal min-jerk trajectory and the deviation and mean value of interaction force become smaller. Conclusions: The proposed FQL method can converge in a few episodes and can improve the performance of pHRI by minimizing the jerk based cost function","PeriodicalId":29807,"journal":{"name":"Cobot","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fuzzy Q-Learning interaction controller design for collaborative robot\",\"authors\":\"Kaichen Ying, Chen chin-yin, Longxiang Wang\",\"doi\":\"10.12688/cobot.17595.1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: In physical human-robot interaction (pHRI), admittance control is widely used. The most critical thing in admittance control is the configuration of admittance parameters, but a constant admittance value can not meet the needs of interactive indicators smoothness especially. Variable admittance control is a method to overcome this limitation by adjusting the admittance value in real time. This paper proposes a fuzzy Q-learning (FQL) variable admittance control system, which integrates the fuzzy system (FIS) and reinforcement learning method Q-learning. Methods: FIS is used to turn a continuous input state into fuzzy set and Q-learning is used to train the premise strength of fuzzy rules to get the optimal policy of variable admittance value. To verify the performance of this method, an experiment was performed using an AUBO i5 robot. Training trajectory is point-to-point (PTP) trajectory, several interaction variables before and after training by the algorithm are compared to show the validity of algorithm. Results: Experimental results show that the reward converges to a smaller value in about 25 episodes, and the reward of the last five episodes reduces by 68%. The motion trajectory after algorithm training is closer to the ideal min-jerk trajectory and the deviation and mean value of interaction force become smaller. Conclusions: The proposed FQL method can converge in a few episodes and can improve the performance of pHRI by minimizing the jerk based cost function\",\"PeriodicalId\":29807,\"journal\":{\"name\":\"Cobot\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cobot\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.12688/cobot.17595.1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cobot","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12688/cobot.17595.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

背景：在物理人机交互中，导纳控制被广泛应用。导纳控制中最关键的是导纳参数的配置，但恒定的导纳值尤其不能满足交互指标的平滑性要求。变导纳控制是通过实时调整导纳值来克服这一限制的一种方法。本文将模糊系统（FIS）与强化学习方法Q学习相结合，提出了一种模糊Q学习（FQL）变导纳控制系统。方法：利用FIS将连续输入状态转化为模糊集，利用Q学习训练模糊规则的前提强度，得到变导纳值的最优策略。为了验证该方法的性能，使用AUBO i5机器人进行了实验。训练轨迹为点对点（PTP）轨迹，并对算法训练前后的几个交互变量进行了比较，验证了算法的有效性。结果：实验结果表明，在大约25集中，奖励收敛到一个较小的值，最后5集的奖励减少了68%。算法训练后的运动轨迹更接近理想的最小加速度轨迹，相互作用力的偏差和平均值变小。结论：所提出的FQL方法可以在几次内收敛，并且可以通过最小化基于急动的成本函数来提高pHRI的性能

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Fuzzy Q-Learning interaction controller design for collaborative robot

Background: In physical human-robot interaction (pHRI), admittance control is widely used. The most critical thing in admittance control is the configuration of admittance parameters, but a constant admittance value can not meet the needs of interactive indicators smoothness especially. Variable admittance control is a method to overcome this limitation by adjusting the admittance value in real time. This paper proposes a fuzzy Q-learning (FQL) variable admittance control system, which integrates the fuzzy system (FIS) and reinforcement learning method Q-learning. Methods: FIS is used to turn a continuous input state into fuzzy set and Q-learning is used to train the premise strength of fuzzy rules to get the optimal policy of variable admittance value. To verify the performance of this method, an experiment was performed using an AUBO i5 robot. Training trajectory is point-to-point (PTP) trajectory, several interaction variables before and after training by the algorithm are compared to show the validity of algorithm. Results: Experimental results show that the reward converges to a smaller value in about 25 episodes, and the reward of the last five episodes reduces by 68%. The motion trajectory after algorithm training is closer to the ideal min-jerk trajectory and the deviation and mean value of interaction force become smaller. Conclusions: The proposed FQL method can converge in a few episodes and can improve the performance of pHRI by minimizing the jerk based cost function

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Cobot collaborative robots-

自引率

0.00%

发文量

期刊介绍： Cobot is a rapid multidisciplinary open access publishing platform for research focused on the interdisciplinary field of collaborative robots. The aim of Cobot is to enhance knowledge and share the results of the latest innovative technologies for the technicians, researchers and experts engaged in collaborative robot research. The platform will welcome submissions in all areas of scientific and technical research related to collaborative robots, and all articles will benefit from open peer review. The scope of Cobot includes, but is not limited to: ● Intelligent robots ● Artificial intelligence ● Human-machine collaboration and integration ● Machine vision ● Intelligent sensing ● Smart materials ● Design, development and testing of collaborative robots ● Software for cobots ● Industrial applications of cobots ● Service applications of cobots ● Medical and health applications of cobots ● Educational applications of cobots As well as research articles and case studies, Cobot accepts a variety of article types including method articles, study protocols, software tools, systematic reviews, data notes, brief reports, and opinion articles.