{"title":"Fast peg-in-hole assembly policy for robots based on experience fusion proximal optimization","authors":"Yu Men, Ligang Jin, Fengming Li, Rui Song","doi":"10.12688/cobot.17579.1","DOIUrl":null,"url":null,"abstract":"Background: As an important part of robot operation, peg-in-hole assembly has problems such as a low degree of automation, a large amount of tasks and low efficiency. It is still a huge challenge for robots to automatically complete assembly tasks because the traditional assembly control policy requires complex analysis of the contact model and it is difficult to build the contact model. The deep reinforcement learning method does not require the establishment of complex contact models, but the long training time and low data utilization efficiency make the training costs very high. Methods: With the aim of addressing the problem of how to accurately obtain the assembly policy and improve the data utilization rate of the robot in the peg-in-hole assembly, we propose the Experience Fusion Proximal Policy Optimization algorithm (EFPPO) based on the Proximal Policy Optimization algorithm (PPO). The algorithm improves the assembly speed and the utilization efficiency of training data by combining force control policy and adding a memory buffer, respectively. Results: We build a single-axis hole assembly system based on the UR5e robotic arm and six-dimensional force sensor in the CoppeliaSim simulation environment to effectively realize the prediction of the assembly environment. Compared with the traditional Deep Deterministic Policy Gradient algorithm (DDPG) and PPO algorithm, the peg-in-hole assembly success rate reaches 100% and the data utilization rate is 125% higher than that of the PPO algorithm. Conclusions: The EFPPO algorithm has a high exploration efficiency. While improving the assembly speed and training speed, the EFPPO algorithm achieves smooth assembly and accurate prediction of the assembly environment.","PeriodicalId":29807,"journal":{"name":"Cobot","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cobot","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12688/cobot.17579.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: As an important part of robot operation, peg-in-hole assembly has problems such as a low degree of automation, a large amount of tasks and low efficiency. It is still a huge challenge for robots to automatically complete assembly tasks because the traditional assembly control policy requires complex analysis of the contact model and it is difficult to build the contact model. The deep reinforcement learning method does not require the establishment of complex contact models, but the long training time and low data utilization efficiency make the training costs very high. Methods: With the aim of addressing the problem of how to accurately obtain the assembly policy and improve the data utilization rate of the robot in the peg-in-hole assembly, we propose the Experience Fusion Proximal Policy Optimization algorithm (EFPPO) based on the Proximal Policy Optimization algorithm (PPO). The algorithm improves the assembly speed and the utilization efficiency of training data by combining force control policy and adding a memory buffer, respectively. Results: We build a single-axis hole assembly system based on the UR5e robotic arm and six-dimensional force sensor in the CoppeliaSim simulation environment to effectively realize the prediction of the assembly environment. Compared with the traditional Deep Deterministic Policy Gradient algorithm (DDPG) and PPO algorithm, the peg-in-hole assembly success rate reaches 100% and the data utilization rate is 125% higher than that of the PPO algorithm. Conclusions: The EFPPO algorithm has a high exploration efficiency. While improving the assembly speed and training speed, the EFPPO algorithm achieves smooth assembly and accurate prediction of the assembly environment.
期刊介绍:
Cobot is a rapid multidisciplinary open access publishing platform for research focused on the interdisciplinary field of collaborative robots. The aim of Cobot is to enhance knowledge and share the results of the latest innovative technologies for the technicians, researchers and experts engaged in collaborative robot research. The platform will welcome submissions in all areas of scientific and technical research related to collaborative robots, and all articles will benefit from open peer review.
The scope of Cobot includes, but is not limited to:
● Intelligent robots
● Artificial intelligence
● Human-machine collaboration and integration
● Machine vision
● Intelligent sensing
● Smart materials
● Design, development and testing of collaborative robots
● Software for cobots
● Industrial applications of cobots
● Service applications of cobots
● Medical and health applications of cobots
● Educational applications of cobots
As well as research articles and case studies, Cobot accepts a variety of article types including method articles, study protocols, software tools, systematic reviews, data notes, brief reports, and opinion articles.