Ali Aflakian, Alireza Rastegarpanah, Jamie Hathaway, Rustam Stolkin
{"title":"An online hyper-volume action bounding approach for accelerating the process of deep reinforcement learning from multiple controllers","authors":"Ali Aflakian, Alireza Rastegarpanah, Jamie Hathaway, Rustam Stolkin","doi":"10.1002/rob.22355","DOIUrl":"10.1002/rob.22355","url":null,"abstract":"<p>This paper fuses ideas from reinforcement learning (RL), Learning from Demonstration (LfD), and Ensemble Learning into a single paradigm. Knowledge from a mixture of control algorithms (experts) are used to constrain the action space of the agent, enabling faster RL refining of a control policy, by avoiding unnecessary explorative actions. Domain-specific knowledge of each expert is exploited. However, the resulting policy is robust against errors of individual experts, since it is refined by a RL reward function without copying any particular demonstration. Our method has the potential to supplement existing RLfD methods when multiple algorithmic approaches are available to function as experts, specifically in tasks involving continuous action spaces. We illustrate our method in the context of a visual servoing (VS) task, in which a 7-DoF robot arm is controlled to maintain a desired pose relative to a target object. We explore four methods for bounding the actions of the RL agent during training. These methods include using a hypercube and convex hull with modified loss functions, ignoring actions outside the convex hull, and projecting actions onto the convex hull. We compare the training progress of each method using expert demonstrators, employing one expert demonstrator with the DAgger algorithm, and without using any demonstrators. Our experiments show that using the convex hull with a modified loss function not only accelerates learning but also provides the most optimal solution compared with other approaches. Furthermore, we demonstrate faster VS error convergence while maintaining higher manipulability of the arm, compared with classical image-based VS, position-based VS, and hybrid-decoupled VS.</p>","PeriodicalId":192,"journal":{"name":"Journal of Field Robotics","volume":"41 6","pages":"1814-1828"},"PeriodicalIF":4.2,"publicationDate":"2024-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/rob.22355","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140831758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ASV station keeping under wind disturbances using neural network simulation error minimization model predictive control","authors":"Jalil Chavez-Galaviz, Jianwen Li, Ajinkya Chaudhary, Nina Mahmoudian","doi":"10.1002/rob.22346","DOIUrl":"10.1002/rob.22346","url":null,"abstract":"<p>Station keeping is an essential maneuver for autonomous surface vehicles (ASVs), mainly when used in confined spaces, to carry out surveys that require the ASV to keep its position or in collaboration with other vehicles where the relative position has an impact over the mission. However, this maneuver can become challenging for classic feedback controllers due to the need for an accurate model of the ASV dynamics and the environmental disturbances. This work proposes a model predictive controller using neural network simulation error minimization (NNSEM–MPC) to accurately predict the dynamics of the ASV under wind disturbances. The performance of the proposed scheme under wind disturbances is tested and compared against other controllers in simulation, using the robotics operating system and the multipurpose simulation environment Gazebo. A set of six tests was conducted by combining two varying wind speeds that are modeled as the Harris spectrum and three wind directions (<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 \u0000 <mrow>\u0000 <msup>\u0000 <mn>0</mn>\u0000 \u0000 <mo>°</mo>\u0000 </msup>\u0000 </mrow>\u0000 </mrow>\u0000 <annotation> ${0}^{^circ }$</annotation>\u0000 </semantics></math>, <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 \u0000 <mrow>\u0000 <msup>\u0000 <mn>90</mn>\u0000 \u0000 <mo>°</mo>\u0000 </msup>\u0000 </mrow>\u0000 </mrow>\u0000 <annotation> ${90}^{^circ }$</annotation>\u0000 </semantics></math>, and <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 \u0000 <mrow>\u0000 <msup>\u0000 <mn>180</mn>\u0000 \u0000 <mo>°</mo>\u0000 </msup>\u0000 </mrow>\u0000 </mrow>\u0000 <annotation> ${180}^{^circ }$</annotation>\u0000 </semantics></math>). The simulation results clearly show the advantage of the NNSEM–MPC over the following methods: backstepping controller, sliding mode controller, simplified dynamics MPC (SD-MPC), neural ordinary differential equation MPC (NODE-MPC), and knowledge-based NODE MPC. The proposed NNSEM–MPC approach performs better than the rest in five out of the six test conditions, and it is the second best in the remaining test case, reducing the mean position and heading error by at least <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 \u0000 <mrow>\u0000 <mn>27.08</mn>\u0000 </mrow>\u0000 ","PeriodicalId":192,"journal":{"name":"Journal of Field Robotics","volume":"41 6","pages":"1797-1813"},"PeriodicalIF":4.2,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/rob.22346","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140806374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning-based monocular visual-inertial odometry with \u0000 \u0000 \u0000 \u0000 S\u0000 \u0000 E\u0000 2\u0000 \u0000 \u0000 (\u0000 3\u0000 )\u0000 \u0000 \u0000 \u0000 $S{E}_{2}(3)$\u0000 -EKF","authors":"Chi Guo, Jianlang Hu, Yarong Luo","doi":"10.1002/rob.22349","DOIUrl":"10.1002/rob.22349","url":null,"abstract":"<p>Learning-based visual odometry (VO) becomes popular as it achieves a remarkable performance without manually crafted image processing and burdensome calibration. Meanwhile, the inertial navigation can provide a localization solution to assist VO when the VO produces poor state estimation under challenging visual conditions. Therefore, the combination of learning-based technique and classical state estimation method can further improve the performance of pose estimation. In this paper, we propose a learning-based visual-inertial odometry (VIO) algorithm, which consists of an end-to-end VO network and an <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 \u0000 <mrow>\u0000 <mi>S</mi>\u0000 \u0000 <msub>\u0000 <mi>E</mi>\u0000 \u0000 <mn>2</mn>\u0000 </msub>\u0000 \u0000 <mrow>\u0000 <mo>(</mo>\u0000 \u0000 <mn>3</mn>\u0000 \u0000 <mo>)</mo>\u0000 </mrow>\u0000 </mrow>\u0000 </mrow>\u0000 <annotation> $S{E}_{2}(3)$</annotation>\u0000 </semantics></math>-Extended Kalman Filter (EKF). The VO network mainly combines a convolutional neural network with a recurrent neural network, taking advantage of two consecutive monocular images to produce relative pose estimation with associated uncertainties. The <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 \u0000 <mrow>\u0000 <mi>S</mi>\u0000 \u0000 <msub>\u0000 <mi>E</mi>\u0000 \u0000 <mn>2</mn>\u0000 </msub>\u0000 \u0000 <mrow>\u0000 <mo>(</mo>\u0000 \u0000 <mn>3</mn>\u0000 \u0000 <mo>)</mo>\u0000 </mrow>\u0000 </mrow>\u0000 </mrow>\u0000 <annotation> $S{E}_{2}(3)$</annotation>\u0000 </semantics></math>-EKF, which is proved to overcome the inconsistency issues of VIO, propagates inertial measurement unit kinematics-based states, and fuses relative measurements and uncertainties from the VO network in its update step. The extensive experimental results on the KITTI data set and the EuRoC data set demonstrate the superior performance of the proposed method compared to other related methods.</p>","PeriodicalId":192,"journal":{"name":"Journal of Field Robotics","volume":"41 6","pages":"1780-1796"},"PeriodicalIF":4.2,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140661451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}