{"title":"Online Reinforcement Learning Algorithm Design for Adaptive Optimal Consensus Control Under Interval Excitation","authors":"Yong Xu;Qi-Yue Che;Meng-Ying Wan;Di Mei;Zheng-Guang Wu","doi":"10.1109/TSMC.2025.3583212","DOIUrl":"https://doi.org/10.1109/TSMC.2025.3583212","url":null,"abstract":"This article proposes online data-based reinforcement learning (RL) algorithm for adaptive output consensus control of heterogeneous multiagent systems (MASs) with unknown dynamics. First, we employ the adaptive control technique to design a distributed observer, which provides an estimation of the leader for partial agents, thereby eliminating the need for the global information. Then, we propose a novel data-based adaptive dynamic programming (ADP) approach, associated with a double-integrator operator, to develop an online data-driven learning algorithm for learning the optimal control policy. However, existing optimal control strategy learning algorithms rely on the persistent excitation conditions (PECs), the full-rank condition, and the offline storage of historical data. To address these issues, our proposed method learns the optimal control policy online by solving a data-driven linear regression equations (LREs) based on an online-verifiable interval excitation (IE) condition, instead of relying on PEC. In addition, the uniqueness of the LRE solution is established by verifying the invertibility of a matrix, instead of satisfying the full-rank condition related to PEC and historical data storage as required in existing algorithms. It is demonstrated that our proposed learning algorithm not only guarantees optimal tracking with unknown dynamics but also relaxes some of the strict conditions of existing learning algorithms. Finally, a numerical example is provided to validate the effectiveness and performance of the proposed algorithms.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 10","pages":"7325-7334"},"PeriodicalIF":8.7,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145090071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuesong Wang;Hengrui Zhang;Jiazhi Zhang;C. L. Philip Chen;Yuhu Cheng
{"title":"PCDT: Pessimistic Critic Decision Transformer for Offline Reinforcement Learning","authors":"Xuesong Wang;Hengrui Zhang;Jiazhi Zhang;C. L. Philip Chen;Yuhu Cheng","doi":"10.1109/TSMC.2025.3583392","DOIUrl":"https://doi.org/10.1109/TSMC.2025.3583392","url":null,"abstract":"decision transformer (DT), as a conditional sequence modeling (CSM) approach, learns the action distribution for each state using historical information, such as trajectory returns, offering a supervised learning paradigm for offline reinforcement learning (Offline RL). However, due to the fact that DT solely concentrates on an individual trajectory with high returns-to-go, it neglects the potential for constructing optimal trajectories by combining sequences of different actions. In other words, traditional DT lacks the trajectory stitching capability. To address the concern, a novel DT (PCDT) for Offline RL is proposed. Our approach begins by pretraining a standard DT to explicitly capture behavior sequences. Next, we apply the sequence importance sampling to penalize actions that significantly deviate from these behavior sequences, thereby constructing a pessimistic critic. Finally, Q-values are integrated into the policy update process, enabling the learned policy to approximate the behavior policy while favoring actions associated with the highest Q-value. Theoretical analysis shows that the sequence importance sampling in pessimistic critic decision transformer (PCDT) establishes a pessimistic lower bound, while the value optimality ensures that PCDT is capable of learning the optimal policy. Results on the D4RL benchmark tasks and ablation studies show that PCDT inherits the strengths of actor–critic (AC) and CSM methods, achieving the highest normalized scores on challenging sparse-reward and long-horizon tasks. Our code are available at <uri>https://github.com/Henry0132/PCDT</uri>.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 10","pages":"7247-7258"},"PeriodicalIF":8.7,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145100364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ben Niu;Xinliang Zhao;Yahui Gao;Shengtao Li;Jihang Sui;Huanqing Wang
{"title":"Adaptive Fixed-Time Event-Triggered Consensus Tracking Control for Robotic Multiagent Systems","authors":"Ben Niu;Xinliang Zhao;Yahui Gao;Shengtao Li;Jihang Sui;Huanqing Wang","doi":"10.1109/TSMC.2025.3582649","DOIUrl":"https://doi.org/10.1109/TSMC.2025.3582649","url":null,"abstract":"In this article, an adaptive fixed-time event-triggered consensus tracking control strategy is proposed for the robotic multiagent systems (MASs). First, this article considers the robotic MASs rather than the single robotic manipulator system, which is of great research significance in practical applications. Then, the adaptive fixed-time control method within the backstepping technique is developed such that each robotic manipulator can track the ideal signal more quickly. Moreover, in the face of complex tasks, the communication resources of the robotic MASs are in short supply. By sampling the data from the original controller, the relative threshold event-triggered control (RTETC) strategy is adopted for each robotic manipulator system, which can ensure that all signals in the closed-loop system are bounded without the Zeno phenomenon. In the end, a simulation example is presented to demonstrate the validity of the proposed control strategy.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 10","pages":"7238-7246"},"PeriodicalIF":8.7,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145100323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiaofeng Zhang;Meng Li;Yong Chen;Meng Zhang;Haiyu Song
{"title":"Observer-Based DETM-Switching- H∞ Control for Disturbed Servo Systems Under DoS Attacks","authors":"Qiaofeng Zhang;Meng Li;Yong Chen;Meng Zhang;Haiyu Song","doi":"10.1109/TSMC.2025.3582912","DOIUrl":"https://doi.org/10.1109/TSMC.2025.3582912","url":null,"abstract":"This article investigates the secure control of a class of servo DC motors in the presence of input–output disturbances and DoS attacks. A multichannel observer-based switching <inline-formula> <tex-math>$Hinfty $ </tex-math></inline-formula> control strategy is proposed and a dynamic event triggering mechanism (DETM) is designed to save network resources. First, a mathematical model of servo DC motor containing input–output disturbances is developed and discretized to make it more suitable for computer control. Then, a state observer and a multichannel transmission strategy based on Markov theory are designed in order to obtain the accurate knowledge of disturbed system and transmit it to the remote controller under DoS attack. Third, observer-based state feedback switching <inline-formula> <tex-math>$Hinfty $ </tex-math></inline-formula> control strategy is proposed and the stability is demonstrated. Furthermore, the DETM is presented to reduce the occupation of network resources by introducing dynamic trigger variable. Finally, the performance of the characterized control strategy is verified by a numerical simulation and a semi-physical simulation.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 10","pages":"7315-7324"},"PeriodicalIF":8.7,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145090070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Intelligent Control in Asymmetric Decision-Making: An Event-Triggered RL Approach for Mismatched Uncertainties","authors":"Xiangnan Zhong;Zhen Ni","doi":"10.1109/TSMC.2025.3583066","DOIUrl":"https://doi.org/10.1109/TSMC.2025.3583066","url":null,"abstract":"Artificial intelligence (AI)-based multiplayer systems have attracted increasing attention across diverse fields. While most research focuses on simultaneous-move multiplayer games to achieve Nash equilibrium, there are complex applications that involve hierarchical decision-making, where certain players act before others. This power asymmetry increases the complexity of strategic interactions, especially in the presence of mismatched uncertainties that can compromise data reliability and decision-making. To this end, this article develops a novel event-triggered reinforcement learning (RL) approach for hierarchical multiplayer systems with mismatched uncertainties. Specifically, by establishing an auxiliary augment system and designing appropriate cost functions for the high-level leader and low-level followers, we reformulate the hierarchical robust control problem as an optimization task within the Stackelberg–Nash game framework. Furthermore, an event-triggered scheme is designed to reduce the computational overhead and a neural-RL-based method is developed to automatically learn the event-triggered control policies for hierarchical players. Theoretical analyses are conducted to 1) demonstrate the stability preservation of the designed robust-optimal transformation; 2) verify the achievement of Stackelberg–Nash equilibrium under the developed event-triggered policies; and 3) guarantee the boundedness of the impulsive closed-loop system. Finally, the simulation studies validate the effectiveness of the developed method.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 10","pages":"7288-7301"},"PeriodicalIF":8.7,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145100363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal Time-Varying Q-Learning Algorithm for Affine Nonlinear Systems With Coupled Players","authors":"Huaguang Zhang;Shuhang Yu;Jiayue Sun;Mei Li","doi":"10.1109/TSMC.2025.3580988","DOIUrl":"https://doi.org/10.1109/TSMC.2025.3580988","url":null,"abstract":"To address the finite-horizon coupled two-player mixed <inline-formula> <tex-math>$H_{2}/H_{infty }$ </tex-math></inline-formula> control challenge within a continuous-time affine nonlinear system, this research introduces a distinctive Q-function and presents an innovative adaptive dynamic programming (ADP) method that operates autonomously of system-specific information. Initially, we formulate the time-varying Hamilton-Jacobi–Isaacs (HJI) equations, which pose a significant challenge for resolution due to their time-dependent and nonlinear nature. Subsequently, a novel offline policy iteration (PI) algorithm is introduced, highlighting its convergence and reinforcing the substantive proof of the existence of Nash equilibrium points. Moreover, a novel action-dependent Q-function is established to facilitate entirely model-free learning, representing the initial foray into the mixed <inline-formula> <tex-math>$H_{2}/H_{infty }$ </tex-math></inline-formula> control problem involving coupled players. The Lyapunov direct approach is employed to ensure the stability of the closed-loop uncertain affine nonlinear system under the ADP-based control scheme, guaranteeing uniform ultimate boundedness (UUB). Finally, a numerical simulation is conducted to validate the effectiveness of the aforementioned ADP-based control approach.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 10","pages":"7037-7047"},"PeriodicalIF":8.7,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145100399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Hidden Transition for Nonstationary Environments With Multistep Tree Search","authors":"Yangqing Fu;Yue Gao","doi":"10.1109/TSMC.2025.3578730","DOIUrl":"https://doi.org/10.1109/TSMC.2025.3578730","url":null,"abstract":"Deep reinforcement learning (DRL) algorithms have shown impressive results in various applications, but nonstationary environments, such as varying operating conditions and external disturbances, remain a significant challenge. To address this challenge, we propose the hidden transition inference (HTI) framework for learning nonstationary transitions in multistep tree search. Different from previous methods that focus on single-step transition changes, the HTI framework improves decision-making by inferring multistep environmental variations. Specifically, this framework constructs a probabilistic graphical model for Monte Carlo tree search (MCTS) in latent space and utilizes the variational lower bound of hidden states for policy improvement. Furthermore, this work theoretically proves the convergence of the HTI framework, ensuring its effectiveness in handling nonstationary environments. The proposed framework is integrated with the state-of-the-art MCTS-based algorithm sampled MuZero and evaluated on multiple control tasks with different nonstationary dynamics transitions. Experimental results show that the HTI framework can improve the inference capability of tree search in nonstationary environments, showcasing its potential for addressing the control challenges in nonstationary environments.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 10","pages":"7012-7023"},"PeriodicalIF":8.7,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145100441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Active Resilient Secure Control for Heterogeneous Swarm Systems Under Malicious Cyber-Attacks","authors":"Yishi Liu;Xiwang Dong;Enrico Zio;Ying Cui","doi":"10.1109/TSMC.2025.3580940","DOIUrl":"https://doi.org/10.1109/TSMC.2025.3580940","url":null,"abstract":"This article concentrates on the design of an active resilient formation tracking control strategy for heterogeneous swarm systems (HSS) under malicious cyber-attacks. The attack signals, which are injected into both actuator and sensor randomly, can be detected and estimated by using an observer-based attack estimation scheme. The compromised measured outputs of individuals in the swarm are used to design the distributed secure control protocol and the consensus-based formation condition. For each follower, a compensator is designed to address the secure formation tracking problem for HSS using an approximate model following strategy. Moreover, the event-triggered technique is utilized to achieve the time-varying formation and to track the leader, and saves the communication resource in practice. Finally, a numerous simulation for a heterogeneous swarm system with three different dynamics nodes is presented to verify the effectiveness of the proposed secure approach.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 10","pages":"7195-7204"},"PeriodicalIF":8.7,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145100414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhi Zheng;Tao Jiang;Jianchuan Ye;Shaoxin Sun;Xiaojie Su
{"title":"Adaptive Perturbation Suppression Control for Multiple Nonholonomic Mobile Robot Clusters Against Composite Motion Windups","authors":"Zhi Zheng;Tao Jiang;Jianchuan Ye;Shaoxin Sun;Xiaojie Su","doi":"10.1109/TSMC.2025.3580449","DOIUrl":"https://doi.org/10.1109/TSMC.2025.3580449","url":null,"abstract":"The existence of compound velocity and acceleration windups in clusters of nonholonomic mobile robots can seriously constrain the smoothness and stability of the overall motion. This article proposes a leader–follower-based distributed formation control framework for smooth and robust clustering of multiple nonholonomic mobile robots under compound windups of velocity and acceleration and unknown perturbations. The decoupled position and orientation kinematics and substrate wheel velocity dynamics are modularly devised via feedback linearization techniques to enable upper-level cooperative error regulation and lower-level wheel velocity trajectory tracking. The auxiliary dynamic system based on the velocity envelope generated by compound motion windups and the WMR kinematic is integrated into the collaborative error, adaptively mitigating the detrimental windup effects. The adaptive saturated extended state observer is utilized to flatly estimate unknown perturbations in the wheel velocity dynamics with enhanced robustness. Finally, the overall stability analyses are done based on Lyapunov’s theorem, and contrastive simulations and plentiful experiments are conducted to attest to the validity and availability.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 10","pages":"7156-7168"},"PeriodicalIF":8.7,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145100451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Framework for Time-Series Dynamic Modeling of Carbon Consumption in Sintering Process","authors":"Jie Hu;Junyong Liu;Min Wu;Witold Pedrycz","doi":"10.1109/TSMC.2025.3583084","DOIUrl":"https://doi.org/10.1109/TSMC.2025.3583084","url":null,"abstract":"It becomes apparent that time-series dynamic prediction for carbon consumption in sintering production process holds immense significance in the steel industry, as it plays a pivotal role in determining the efficiency and environmental impact of the operation. Given the complexities of the sintering process, encompassing multiple operating conditions, numerous parameters, nonlinearities, etc., this article proposes a time-series dynamic modeling method for carbon consumption based on an improved just-in-time learning (JITL) and a gated recurrent unit-based temporal cascade broad learning system (GRU-TCBLS). First, the data correlation analysis method is employed to determine the process parameters affecting carbon consumption. Further, an improved JITL method incorporating moving window and JITL is developed to obtain relevant training data in real-time for model training. Finally, based on these relevant training data, the GRU-TCBLS is formulated to construct a carbon consumption prediction model. Experiments based on actual production data demonstrate the superiority of the proposed method with respect to some state-of-the-art modeling methods.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 10","pages":"7369-7378"},"PeriodicalIF":8.7,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145089943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}