{"title":"Safe Reinforcement Learning: Optimal Formation Control With Collision Avoidance of Multiple Satellite Systems","authors":"Hui Yu;Liqian Dou;Xiuyun Zhang;Jinna Li;Qun Zong","doi":"10.1109/TCYB.2024.3491582","DOIUrl":"10.1109/TCYB.2024.3491582","url":null,"abstract":"This article addresses the collision avoidance and formation control problem for multisatellite systems. A novel safe reinforcement learning (RL) algorithm based on an adaptive dynamic programming framework is proposed. The highlights of the algorithm are the adaptive distance-varying learning method to integrate online data with historical data and the usage of the barrier function (BF) to achieve collision avoidance. First, the BF is introduced into the designed cost function such that the multisatellite formation system can achieve obstacle avoidance and guarantee the safety. Next, a safe RL algorithm is developed through the critic network structure. A distance-varying weight is introduced, which combines experience replay samples with extrapolation samples. By minimizing the cost function, the optimal formation control policy can be obtained with an adaptive formation and self-learning ability. Then, the stability and safety of the proposed algorithm are analyzed. Finally, the effectiveness and superiority of the proposed algorithm are verified by numerical simulations.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"55 1","pages":"447-459"},"PeriodicalIF":9.4,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Observer-Based Human-in-the-Loop Optimal Output Cluster Synchronization Control for Multiagent Systems: A Model-Free Reinforcement Learning Method","authors":"Zongsheng Huang;Tieshan Li;Yue Long;Hongjing Liang","doi":"10.1109/TCYB.2024.3490602","DOIUrl":"10.1109/TCYB.2024.3490602","url":null,"abstract":"This article investigates the observer-based human-in-the-loop (HiTL) optimal output cluster synchronization control problem for nonlinear multiagent systems (MASs). First, the leader is designed to be nonautonomous, with the unknown time-varying input monitored by the human operator directly. To address the problem that leader’s output is not available to each follower, an observer is designed. This observer features practical prescribed-time convergence, and independence of prior knowledge of leader’s input. Then, an augmented system consisting of observer dynamics and follower dynamics is constructed and a cost function is formulated. Accordingly, the HiTL optimal output cluster synchronization control problem is transformed into a solution to the Hamilton-Jacobian–Bellman equation (HJBE). Subsequently, the off-policy reinforcement learning algorithm is utilized to learn the solution to HJBE without complete knowledge of the system dynamics. To alleviate computational burden, the single critic neural network (NN) is employed for the algorithm implementation, with the least square method applied for training the NN weights. Finally, the simulation results are presented to verify the validity of the designed control scheme.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"55 2","pages":"649-660"},"PeriodicalIF":9.4,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Hierarchical Surrogate-Assisted Differential Evolution With Core Space Localization","authors":"Laiqi Yu;Zhenyu Meng;Haibin Zhu","doi":"10.1109/TCYB.2024.3489885","DOIUrl":"10.1109/TCYB.2024.3489885","url":null,"abstract":"Surrogate-assisted evolutionary algorithms (SAEAs) are extensively used to tackle expensive optimization problems (EOPs). The integration of surrogate-based global and local search is a prevalent hierarchical SAEA framework, which can effectively balance exploration and exploitation capabilities. However, it still faces challenges when tackling high-dimensional EOPs (HEOPs) owing to the curse of dimensionality. In this article, we propose a hierarchical surrogate-assisted differential evolution with core space localization (HSADE-CS) to solve HEOPs. Its contributions are listed as follows: 1) a top-promising sampling strategy is introduced in the global search to mitigate the challenges posed by the uncertainty in the performance of the surrogate model; 2) a core space localization (CSL) method is proposed to identify a high-potential space within the local promising region, enhancing the effectiveness of local search; and 3) a fitness-independent adaptive parameter control method based on the Minkowski distance is developed within the differential evolution (DE) optimizer to improve the performance of surrogate model-driven local search. The performance of HSADE-CS has been validated on numerous benchmark problems from the commonly used expensive optimization benchmark suite, as well as the CEC2014 and CEC2017 benchmark suites, with problem dimensions up to 500. It has also been tested on a real-world problem, i.e., circular antenna array design optimization. Experimental results demonstrate that HSADE-CS is highly competitive compared to the state-of-the-art SAEAs.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"55 2","pages":"939-952"},"PeriodicalIF":9.4,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online Trajectory Planning Method for Autonomous Ground Vehicles Confronting Sudden and Moving Obstacles Based on LSTM-Attention Network","authors":"Zhida Xing;Runqi Chai;Kaiyuan Chen;Yuanqing Xia;Senchun Chai","doi":"10.1109/TCYB.2024.3486004","DOIUrl":"10.1109/TCYB.2024.3486004","url":null,"abstract":"This article presents a novel online obstacle avoidance trajectory planning method for autonomous ground vehicles (AGVs) based on long short-term memory-attention (LSTM-Attention) networks. The proposed method can guide AGVs to perform emergency maneuvers when encountering sudden and moving obstacles, while also ensuring high levels of real-time performance and optimality. It consists of two parts: 1) offline training and 2) online planning. In the offline training phase, an AGV obstacle avoidance trajectory dataset is generated using numerical trajectory optimization methods to train the LSTM-Attention network. This training allows the network to capture the mapping between the relative information of the vehicle and the obstacles and the optimal control actions. The trained network is then used for online trajectory planning to achieve optimal feedback obstacle avoidance control for AGVs facing sudden obstacles. Furthermore, to address situations involving sudden obstacles in different directions and moving obstacles, a rotation coordinate system method is proposed, significantly expanding the application scenarios of the proposed approach. The effectiveness and real-time performance of the designed method are comprehensively validated through extensive simulation and physical experiments.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"55 1","pages":"421-435"},"PeriodicalIF":9.4,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142637321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data-Driven Event-Triggered Sliding Mode Secure Control for Autonomous Vehicles Under Actuator Attacks","authors":"Hong-Tao Sun;Xinran Chen;Zhengqiang Zhang;Xiaohua Ge;Chen Peng","doi":"10.1109/TCYB.2024.3490656","DOIUrl":"10.1109/TCYB.2024.3490656","url":null,"abstract":"This article investigates a comprehensive data-driven event-triggered secure lateral control of autonomous vehicles under actuator attacks. We consider stabilization issues of autonomous vehicles subject to modeling difficulties, limited communication resources, and actuator attacks. The dynamic model decomposition (DMD) from data is exploited to characterize the inherent lateral dynamics model of autonomous vehicles, the event-triggered transmission scheme is utilized to alleviate communication burden for limited bandwidth network, and the sliding mode control scheme is designed to ensure the security of autonomous vehicles under actuator attacks. The stability analysis and the stabilization method as well as its algorithm are presented. The proposed secure control scheme can actively counteract the malicious effects caused by actuator attacks and integrates the advantages of both data-driven modeling and model-based control design. Finally, several comparative case studies show the effectiveness of the proposed secure control scheme.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"55 1","pages":"436-446"},"PeriodicalIF":9.4,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142637322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meng Zhai;Tong Yang;Qingxiang Wu;Shudong Guo;Ruiping Pang;Ning Sun
{"title":"Extended Kalman Filtering-Based Nonlinear Model Predictive Control for Underactuated Systems With Multiple Constraints and Obstacle Avoidance","authors":"Meng Zhai;Tong Yang;Qingxiang Wu;Shudong Guo;Ruiping Pang;Ning Sun","doi":"10.1109/TCYB.2024.3488371","DOIUrl":"10.1109/TCYB.2024.3488371","url":null,"abstract":"Underactuated systems are a class of systems in which the number of control inputs is less than the degrees of freedom (DoFs) to be controlled. With the increasing demand for the control performance of underactuated systems, the current research on their optimization of steady-state performance is no longer sufficient. However, owing to limited control inputs, ensuring their transient performance is often difficult. Moreover, some specific composite variables in underactuated systems should be kept within the preset ranges, which poses a significant challenge to collision avoidance safety. In addition, the sensor noises are also an issue that cannot be ignored. To this end, an extended Kalman filtering-based nonlinear model predictive control method for underactuated systems is developed in this article. The key feature of this method is that it simultaneously ensures accurate positioning, multiple constraints, and obstacle avoidance. Specifically, by adding an artificial potential field as an obstacle avoidance penalty term in the cost function and dynamically assigning weight coefficients, efficient collision avoidance control is achieved. Furthermore, it is combined with the extended Kalman filtering and jointly applied to underactuated systems with sensor noises. To the best of our knowledge, it is the first control method that simultaneously considers full-state constraints, specific composite variable constraints, control input and its increment constraints, as well as obstacle avoidance in underactuated systems. The satisfactory control performance of the proposed method is validated by implementing it on two typical underactuated systems, that is, four-DoF overhead cranes and five-DoF tower cranes.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"55 1","pages":"369-382"},"PeriodicalIF":9.4,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142610721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xingchen Yang;Zongtian Yin;Yixuan Sheng;Dario Farina;Honghai Liu
{"title":"Self-Supervised Learning for Intuitive Control of Prosthetic Hand Movements via Sonomyography","authors":"Xingchen Yang;Zongtian Yin;Yixuan Sheng;Dario Farina;Honghai Liu","doi":"10.1109/TCYB.2024.3489438","DOIUrl":"10.1109/TCYB.2024.3489438","url":null,"abstract":"As a primary effector of humans, the hand plays a crucial role in many aspects of daily life. Recognizing multidegree-of-freedom hand movements from muscle activity helps infer human motion intentions. Solving this problem has direct applications in prosthetic and exoskeleton control. Here, we propose a self-supervised learning algorithm inspired by muscle synergies to achieve simultaneous estimation of wrist rotation (supination/pronation) and hand grasp (open/close) from sonomyography—the muscle deformation detected by a wearable ultrasound array. Unlike conventional methods collecting both muscle activity and hand kinematics for supervised model calibration, this algorithm only uses unlabeled forearm ultrasound signals for self-supervised wrist and hand movement estimation, where movement labels are auto-generated. The performance of the proposed algorithm was experimentally evaluated with ten participants including an amputee. Offline analysis demonstrated that the proposed algorithm can accurately estimate simultaneous wrist rotation and hand grasp movements (\u0000<inline-formula> <tex-math>$r_{textrm {wrist}}$ </tex-math></inline-formula>\u0000 and \u0000<inline-formula> <tex-math>$r_{textrm {hand}}$ </tex-math></inline-formula>\u0000 were 0.98 and 0.94 for the able-bodied, and 0.98 and 0.90 for the amputee, respectively). Notably, the performance of the self-supervised learning was superior to the supervised learning for the amputee. Online experiments demonstrated that intended wrist and hand movements can be deciphered in real time, enabling accurate control of a virtual hand. This study will open up a new avenue for the sonomyographic human-machine interaction.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"55 1","pages":"409-420"},"PeriodicalIF":9.4,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142610701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Periodic Event-Triggered Optimal Output Consensus of Heterogeneous Multiagent Systems Subject to Communication Delays","authors":"Tianyu Liu;Lu Liu","doi":"10.1109/TCYB.2024.3485230","DOIUrl":"10.1109/TCYB.2024.3485230","url":null,"abstract":"This article investigates periodic event-triggered optimal output consensus of heterogeneous linear multiagent systems where each agent has knowledge of only its own cost function. In contrast to existing results, we consider communication delays and general strongly connected digraphs. A novel periodic event-triggered distributed control scheme is proposed, which allows asynchronous event detection and time-varying communication delays. Sufficient conditions with respect to the maximum allowable communication delays and event detection periods to achieve asymptotic optimal output consensus are established. Moreover, it is proved that the proposed periodic event-triggering mechanism can provide a positive lower bound of interevent times which is independent of the event detection period. A simulation example is provided to illustrate the effectiveness of the proposed control scheme.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"55 1","pages":"355-368"},"PeriodicalIF":9.4,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142610700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Uncovering Reward Goals in Distributed Drone Swarms Using Physics-Informed Multiagent Inverse Reinforcement Learning","authors":"Adolfo Perrusquía;Weisi Guo","doi":"10.1109/TCYB.2024.3489967","DOIUrl":"10.1109/TCYB.2024.3489967","url":null,"abstract":"The cooperative nature of drone swarms poses risks in the smooth operation of services and the security of national facilities. The control objective of the swarm is, in most cases, occluded due to the complex behaviors observed in each drone. It is paramount to understand which is the control objective of the swarm, whilst understanding better how they communicate with each other to achieve the desired task. To solve these issues, this article proposes a physics-informed multiagent inverse reinforcement learning (PI-MAIRL) that: 1) infers the control objective function or reward function from observational data and 2) uncover the network topology by exploiting a physics-informed model of the dynamics of each drone. The combined contribution enables to understand better the behavior of the swarm, whilst enabling the inference of its objective for experience inference and imitation learning. A physically uncoupled swarm scenario is considered in this study. The incorporation of the physics-informed element allows to obtain an algorithm that is computationally more efficient than model-free IRL algorithms. Convergence of the proposed approach is verified using Lyapunov recursions on a global Riccati equation. Simulation studies are carried out to show the benefits and challenges of the approach.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"55 1","pages":"14-23"},"PeriodicalIF":9.4,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142610702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fan Yang;Wenrui Chen;Haoran Lin;Sijie Wu;Xin Li;Zhiyong Li;Yaonan Wang
{"title":"Task-Oriented Tool Manipulation With Robotic Dexterous Hands: A Knowledge Graph Approach From Fingers to Functionality","authors":"Fan Yang;Wenrui Chen;Haoran Lin;Sijie Wu;Xin Li;Zhiyong Li;Yaonan Wang","doi":"10.1109/TCYB.2024.3487845","DOIUrl":"10.1109/TCYB.2024.3487845","url":null,"abstract":"A primary challenge in robotic tool use is achieving precise manipulation with dexterous robotic hands to mimic human actions. It requires understanding human tool use and allocating specific functions to each robotic finger for fine control. Existing work has primarily focused on the overall grasping capabilities of robotic hands, often neglecting the functional allocation among individual fingers during object interaction. In response to this, we introduce a semantic knowledge-driven approach to distribute functions among fingers for tool manipulation. Central to this approach is the finger-to-function (F2F) knowledge graph, which captures human expertise in tool use and establishes relationships between tool attributes, tasks, and manipulation elements, including functional fingers, components, required force, and gestures. We also develop a manipulation element-oriented prediction algorithm using knowledge graph semantic embedding, enhancing the prediction of manipulation elements’ speed and accuracy. Additionally, we propose the functionality-integrated adaptive force feedback manipulation (FAFM) module, which integrates manipulation elements with adaptive force feedback to achieve precise finger-level control. Our framework does not rely on extensive annotated data for supervision but utilizes semantic constraints from F2F to guide tool manipulation. The proposed method demonstrates superior performance and generalizability in real-world scenarios, achieving an 8% higher success rate in grasping and manipulation of representative tool instances compared to the existing state-of-the-art methods. The dataset and code are available at \u0000<uri>https://github.com/yangfan293/F2F</uri>\u0000.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"55 1","pages":"395-408"},"PeriodicalIF":9.4,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142610705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}