Ba Quoc Anh Nguyen , Ngoc Trung Dang , Thanh Tung Le , Phuong Nam Dao
{"title":"On-policy and Off-policy Q-learning algorithms with policy iteration for two-wheeled inverted pendulum systems","authors":"Ba Quoc Anh Nguyen , Ngoc Trung Dang , Thanh Tung Le , Phuong Nam Dao","doi":"10.1016/j.robot.2025.105111","DOIUrl":null,"url":null,"abstract":"<div><div>This article delves into the investigation of On-policy and Off-policy Q-learning algorithms for controlling two-wheeled inverted pendulum (TWIP) robots in situations where knowledge about the dynamic system is uncertain. Both on-policy and off-policy Q-learning algorithms ensure optimal and model-free control by employing a data collection approach without the knowledge of model. The On-policy algorithm performs real-time data collection, continuously gathering data and iteratively calculating a new control policy until it converges to the optimal value. In contrast, the Off-policy algorithm collects data only once and applies it to the system after completing the learning process. To enhance computational efficiency and minimize the amount of data required, the TWIP system is divided into two Sub-systems. These Sub-systems consist of smaller system matrices that can be controlled independently. This division reduces the data collection burden and accelerates the calculation speed of the algorithms. The utilization of Off-policy techniques proves to be advantageous in developing algorithms with data efficiency and achieving higher accuracy. The influence of probing noise on the Q-function is comprehensively considered in both proposed algorithms. By utilizing a single data set and eliminating the influence of noise, the Off-policy techniques enhance algorithm performance. Finally, the article presents simulation results of the TWIP system to validate the effectiveness of the two proposed control schemes.</div></div>","PeriodicalId":49592,"journal":{"name":"Robotics and Autonomous Systems","volume":"193 ","pages":"Article 105111"},"PeriodicalIF":5.2000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Autonomous Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0921889025002088","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
This article delves into the investigation of On-policy and Off-policy Q-learning algorithms for controlling two-wheeled inverted pendulum (TWIP) robots in situations where knowledge about the dynamic system is uncertain. Both on-policy and off-policy Q-learning algorithms ensure optimal and model-free control by employing a data collection approach without the knowledge of model. The On-policy algorithm performs real-time data collection, continuously gathering data and iteratively calculating a new control policy until it converges to the optimal value. In contrast, the Off-policy algorithm collects data only once and applies it to the system after completing the learning process. To enhance computational efficiency and minimize the amount of data required, the TWIP system is divided into two Sub-systems. These Sub-systems consist of smaller system matrices that can be controlled independently. This division reduces the data collection burden and accelerates the calculation speed of the algorithms. The utilization of Off-policy techniques proves to be advantageous in developing algorithms with data efficiency and achieving higher accuracy. The influence of probing noise on the Q-function is comprehensively considered in both proposed algorithms. By utilizing a single data set and eliminating the influence of noise, the Off-policy techniques enhance algorithm performance. Finally, the article presents simulation results of the TWIP system to validate the effectiveness of the two proposed control schemes.
期刊介绍:
Robotics and Autonomous Systems will carry articles describing fundamental developments in the field of robotics, with special emphasis on autonomous systems. An important goal of this journal is to extend the state of the art in both symbolic and sensory based robot control and learning in the context of autonomous systems.
Robotics and Autonomous Systems will carry articles on the theoretical, computational and experimental aspects of autonomous systems, or modules of such systems.