基于经验融合近端优化的机器人快速孔钉装配策略

Cobot Pub Date : 2023-01-12 DOI:10.12688/cobot.17579.1

Yu Men, Ligang Jin, Fengming Li, Rui Song

{"title":"基于经验融合近端优化的机器人快速孔钉装配策略","authors":"Yu Men, Ligang Jin, Fengming Li, Rui Song","doi":"10.12688/cobot.17579.1","DOIUrl":null,"url":null,"abstract":"Background: As an important part of robot operation, peg-in-hole assembly has problems such as a low degree of automation, a large amount of tasks and low efficiency. It is still a huge challenge for robots to automatically complete assembly tasks because the traditional assembly control policy requires complex analysis of the contact model and it is difficult to build the contact model. The deep reinforcement learning method does not require the establishment of complex contact models, but the long training time and low data utilization efficiency make the training costs very high. Methods: With the aim of addressing the problem of how to accurately obtain the assembly policy and improve the data utilization rate of the robot in the peg-in-hole assembly, we propose the Experience Fusion Proximal Policy Optimization algorithm (EFPPO) based on the Proximal Policy Optimization algorithm (PPO). The algorithm improves the assembly speed and the utilization efficiency of training data by combining force control policy and adding a memory buffer, respectively. Results: We build a single-axis hole assembly system based on the UR5e robotic arm and six-dimensional force sensor in the CoppeliaSim simulation environment to effectively realize the prediction of the assembly environment. Compared with the traditional Deep Deterministic Policy Gradient algorithm (DDPG) and PPO algorithm, the peg-in-hole assembly success rate reaches 100% and the data utilization rate is 125% higher than that of the PPO algorithm. Conclusions: The EFPPO algorithm has a high exploration efficiency. While improving the assembly speed and training speed, the EFPPO algorithm achieves smooth assembly and accurate prediction of the assembly environment.","PeriodicalId":29807,"journal":{"name":"Cobot","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fast peg-in-hole assembly policy for robots based on experience fusion proximal optimization\",\"authors\":\"Yu Men, Ligang Jin, Fengming Li, Rui Song\",\"doi\":\"10.12688/cobot.17579.1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: As an important part of robot operation, peg-in-hole assembly has problems such as a low degree of automation, a large amount of tasks and low efficiency. It is still a huge challenge for robots to automatically complete assembly tasks because the traditional assembly control policy requires complex analysis of the contact model and it is difficult to build the contact model. The deep reinforcement learning method does not require the establishment of complex contact models, but the long training time and low data utilization efficiency make the training costs very high. Methods: With the aim of addressing the problem of how to accurately obtain the assembly policy and improve the data utilization rate of the robot in the peg-in-hole assembly, we propose the Experience Fusion Proximal Policy Optimization algorithm (EFPPO) based on the Proximal Policy Optimization algorithm (PPO). The algorithm improves the assembly speed and the utilization efficiency of training data by combining force control policy and adding a memory buffer, respectively. Results: We build a single-axis hole assembly system based on the UR5e robotic arm and six-dimensional force sensor in the CoppeliaSim simulation environment to effectively realize the prediction of the assembly environment. Compared with the traditional Deep Deterministic Policy Gradient algorithm (DDPG) and PPO algorithm, the peg-in-hole assembly success rate reaches 100% and the data utilization rate is 125% higher than that of the PPO algorithm. Conclusions: The EFPPO algorithm has a high exploration efficiency. While improving the assembly speed and training speed, the EFPPO algorithm achieves smooth assembly and accurate prediction of the assembly environment.\",\"PeriodicalId\":29807,\"journal\":{\"name\":\"Cobot\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cobot\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.12688/cobot.17579.1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cobot","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12688/cobot.17579.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

背景:孔钉装配作为机器人作业的重要组成部分，存在自动化程度低、任务量大、效率低等问题。由于传统的装配控制策略需要对接触模型进行复杂的分析，并且难以建立接触模型，因此对机器人自动完成装配任务仍然是一个巨大的挑战。深度强化学习方法不需要建立复杂的接触模型，但训练时间长、数据利用效率低使得训练成本非常高。方法:针对钉孔装配中如何准确获取装配策略和提高机器人数据利用率的问题，在近端策略优化算法(PPO)的基础上，提出了经验融合近端策略优化算法(EFPPO)。该算法通过结合力控制策略和增加内存缓冲区来提高训练数据的装配速度和利用效率。结果:在CoppeliaSim仿真环境下，构建了基于UR5e机械臂和六维力传感器的单轴孔装配系统，有效实现了对装配环境的预测。与传统的深度确定性策略梯度算法(Deep Deterministic Policy Gradient algorithm, DDPG)和PPO算法相比，钉入孔装配成功率达到100%，数据利用率比PPO算法提高125%。结论:EFPPO算法具有较高的搜索效率。在提高装配速度和训练速度的同时，EFPPO算法实现了平稳装配和对装配环境的准确预测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Fast peg-in-hole assembly policy for robots based on experience fusion proximal optimization

Background: As an important part of robot operation, peg-in-hole assembly has problems such as a low degree of automation, a large amount of tasks and low efficiency. It is still a huge challenge for robots to automatically complete assembly tasks because the traditional assembly control policy requires complex analysis of the contact model and it is difficult to build the contact model. The deep reinforcement learning method does not require the establishment of complex contact models, but the long training time and low data utilization efficiency make the training costs very high. Methods: With the aim of addressing the problem of how to accurately obtain the assembly policy and improve the data utilization rate of the robot in the peg-in-hole assembly, we propose the Experience Fusion Proximal Policy Optimization algorithm (EFPPO) based on the Proximal Policy Optimization algorithm (PPO). The algorithm improves the assembly speed and the utilization efficiency of training data by combining force control policy and adding a memory buffer, respectively. Results: We build a single-axis hole assembly system based on the UR5e robotic arm and six-dimensional force sensor in the CoppeliaSim simulation environment to effectively realize the prediction of the assembly environment. Compared with the traditional Deep Deterministic Policy Gradient algorithm (DDPG) and PPO algorithm, the peg-in-hole assembly success rate reaches 100% and the data utilization rate is 125% higher than that of the PPO algorithm. Conclusions: The EFPPO algorithm has a high exploration efficiency. While improving the assembly speed and training speed, the EFPPO algorithm achieves smooth assembly and accurate prediction of the assembly environment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Cobot collaborative robots-

自引率

0.00%

发文量

期刊介绍： Cobot is a rapid multidisciplinary open access publishing platform for research focused on the interdisciplinary field of collaborative robots. The aim of Cobot is to enhance knowledge and share the results of the latest innovative technologies for the technicians, researchers and experts engaged in collaborative robot research. The platform will welcome submissions in all areas of scientific and technical research related to collaborative robots, and all articles will benefit from open peer review. The scope of Cobot includes, but is not limited to: ● Intelligent robots ● Artificial intelligence ● Human-machine collaboration and integration ● Machine vision ● Intelligent sensing ● Smart materials ● Design, development and testing of collaborative robots ● Software for cobots ● Industrial applications of cobots ● Service applications of cobots ● Medical and health applications of cobots ● Educational applications of cobots As well as research articles and case studies, Cobot accepts a variety of article types including method articles, study protocols, software tools, systematic reviews, data notes, brief reports, and opinion articles.