{"title":"Optimization Landscape of Policy Gradient Methods for Discrete-Time Static Output Feedback","authors":"Jingliang Duan;Jie Li;Xuyang Chen;Kai Zhao;Shengbo Eben Li;Lin Zhao","doi":"10.1109/TCYB.2023.3323316","DOIUrl":"10.1109/TCYB.2023.3323316","url":null,"abstract":"In recent times, significant advancements have been made in delving into the optimization landscape of policy gradient methods for achieving optimal control in linear time-invariant (LTI) systems. Compared with state-feedback control, output-feedback control is more prevalent since the underlying state of the system may not be fully observed in many practical settings. This article analyzes the optimization landscape inherent to policy gradient methods when applied to static output feedback (SOF) control in discrete-time LTI systems subject to quadratic cost. We begin by establishing crucial properties of the SOF cost, encompassing coercivity, \u0000<inline-formula> <tex-math>$L$ </tex-math></inline-formula>\u0000-smoothness, and \u0000<inline-formula> <tex-math>$M$ </tex-math></inline-formula>\u0000-Lipschitz continuous Hessian. Despite the absence of convexity, we leverage these properties to derive novel findings regarding convergence (and nearly dimension-free rate) to stationary points for three policy gradient methods, including the vanilla policy gradient method, the natural policy gradient method, and the Gauss–Newton method. Moreover, we provide proof that the vanilla policy gradient method exhibits linear convergence toward local minima when initialized near such minima. This article concludes by presenting numerical examples that validate our theoretical findings. These results not only characterize the performance of gradient descent for optimizing the SOF problem but also provide insights into the effectiveness of general policy gradient methods within the realm of reinforcement learning.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"54 6","pages":"3588-3601"},"PeriodicalIF":11.8,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"54228923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid Residual Multiexpert Reinforcement Learning for Spatial Scheduling of High-Density Parking Lots","authors":"Jing Hou;Guang Chen;Zhijun Li;Wei He;Shangding Gu;Alois Knoll;Changjun Jiang","doi":"10.1109/TCYB.2023.3312647","DOIUrl":"10.1109/TCYB.2023.3312647","url":null,"abstract":"Industries, such as manufacturing, are accelerating their embrace of the metaverse to achieve higher productivity, especially in complex industrial scheduling. In view of the growing parking challenges in large cities, high-density vehicle spatial scheduling is one of the potential solutions. Stack-based parking lots utilize parking robots to densely park vehicles in the vertical stacks like container stacking, which greatly reduces the aisle area in the parking lot, but requires complex scheduling algorithms to park and take out the vehicles. The existing high-density parking (HDP) scheduling algorithms are mainly heuristic methods, which only contain simple logic and are difficult to utilize information effectively. We propose a hybrid residual multiexpert (HIRE) reinforcement learning (RL) approach, a method for interactive learning in the digital industrial metaverse, which efficiently solves the HDP batch space scheduling problem. In our proposed framework, each heuristic scheduling method is considered as an expert. The neural network trained by RL assigns the expert strategy according to the current parking lot state. Furthermore, to avoid being limited by heuristic expert performance, the proposed hierarchical network framework also sets up a residual output channel. Experiments show that our proposed algorithm outperforms various advanced heuristic methods and the end-to-end RL method in the number of vehicle maneuvers, and has good robustness to the parking lot size and the estimation accuracy of vehicle exit time. We believe that the proposed HIRE RL method can be effectively and conveniently applied to practical application scenarios, which can be regarded as a key step for RL to enter the practical application stage of the industrial metaverse.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"54 5","pages":"2771-2783"},"PeriodicalIF":11.8,"publicationDate":"2023-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49690249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Emotion Recognition From Multimodal Physiological Signals via Discriminative Correlation Fusion With a Temporal Alignment Mechanism","authors":"Kechen Hou;Xiaowei Zhang;Yikun Yang;Qiqi Zhao;Wenjie Yuan;Zhongyi Zhou;Sipo Zhang;Chen Li;Jian Shen;Bin Hu","doi":"10.1109/TCYB.2023.3320107","DOIUrl":"10.1109/TCYB.2023.3320107","url":null,"abstract":"Modeling correlations between multimodal physiological signals [e.g., canonical correlation analysis (CCA)] for emotion recognition has attracted much attention. However, existing studies rarely consider the neural nature of emotional responses within physiological signals. Furthermore, during fusion space construction, the CCA method maximizes only the correlations between different modalities and neglects the discriminative information of different emotional states. Most importantly, temporal mismatches between different neural activities are often ignored; therefore, the theoretical assumptions that multimodal data should be aligned in time and space before fusion are not fulfilled. To address these issues, we propose a discriminative correlation fusion method coupled with a temporal alignment mechanism for multimodal physiological signals. We first use neural signal analysis techniques to construct neural representations of the central nervous system (CNS) and autonomic nervous system (ANS). respectively. Then, emotion class labels are introduced in CCA to obtain more discriminative fusion representations from multimodal neural responses, and the temporal alignment between the CNS and ANS is jointly optimized with a fusion procedure that applies the Bayesian algorithm. The experimental results demonstrate that our method significantly improves the emotion recognition performance. Additionally, we show that this fusion method can model the underlying mechanisms in human nervous systems during emotional responses, and our results are consistent with prior findings. This study may guide a new approach for exploring human cognitive function based on physiological signals at different time scales and promote the development of computational intelligence and harmonious human–computer interactions.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"54 5","pages":"3079-3092"},"PeriodicalIF":11.8,"publicationDate":"2023-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49676970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Output Formation Containment for Multiagent Systems Under Multipoint Multipattern FDI Attacks: A Resilient Impulsive Compensation Control Approach","authors":"Hongjun Chu;Sergey Gorbachev;Dong Yue;Chunxia Dou","doi":"10.1109/TCYB.2023.3319647","DOIUrl":"10.1109/TCYB.2023.3319647","url":null,"abstract":"The increasing number of devices and frequent interactions of agents from networked multiagent systems (MASs) exacerbate the risks of potential cyber attacks, especially the different point attacks and multiple pattern attacks. This article considers the output formation-containment problem for MASs under multipoint multipattern false data injection (FDI) attacks. The multipoint describes the attacks simultaneously occurring on the sensors, actuators, and communication channels; the multipattern captures that sensor and actuator attack signals are both continuous deterministic variables, and the communication channel attack signals are intermittent random variables, obeying the Bernoulli distribution. For such compromised MASs, a novel hybrid protocol is proposed, which integrates a state observer, an attack estimator, an impulsive interactor and a compensation controller. Thereinto, the state observer and the attack estimator are constructed to recover the unmeasured system states and the unknown FDI attack signals, respectively; the impulsive interactor is designed to guarantee that the neighbor’s signals are transmitted only at impulsive instants, and meanwhile the channel attacks are randomly launched; using the recovered signals, the compensation controller is devised to alleviate the effect of attacks. A sufficient condition is identified, under which the output formation containment is achieved with cooperative uniform ultimate boundedness (UUB). Finally, simulation results are carried out to validate the effectiveness and advantages of the proposed approach.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"54 4","pages":"2606-2617"},"PeriodicalIF":11.8,"publicationDate":"2023-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49676971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IEEE Transactions on Cybernetics","authors":"","doi":"10.1109/TCYB.2023.3322027","DOIUrl":"https://doi.org/10.1109/TCYB.2023.3322027","url":null,"abstract":"","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"53 11","pages":"C4-C4"},"PeriodicalIF":11.8,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6221036/10286986/10287085.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67759137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IEEE Transactions on Cybernetics","authors":"","doi":"10.1109/TCYB.2023.3322025","DOIUrl":"https://doi.org/10.1109/TCYB.2023.3322025","url":null,"abstract":"","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"53 11","pages":"C3-C3"},"PeriodicalIF":11.8,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6221036/10286986/10287082.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67759648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fault-Tolerant Control of Stochastic High-Order Fully Actuated Systems","authors":"Xueqing Liu;Maoyin Chen;Donghua Zhou;Li Sheng","doi":"10.1109/TCYB.2023.3320441","DOIUrl":"10.1109/TCYB.2023.3320441","url":null,"abstract":"In recent years, high-order fully actuated (HOFA) systems, founded by Prof. GR Duan, have recorded rapid progress for deterministic systems. However, the control issue of stochastic fully actuated systems is still an open problem. This study develops a novel stochastic HOFA system model that complements the existing HOFA methodology. Notably, stochastic signals can be considered in the proposed model, different from the case in the deterministic model. By adopting a high-order operator, equivalent control and stabilization control laws are realized to guarantee the global asymptotic stability in probability of the closed-loop system. For the system with sensor gain faults, an observer-based fault-tolerant control law is designed. Finally, the simulation results validate the effectiveness of the proposed control schemes.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"54 5","pages":"3225-3238"},"PeriodicalIF":11.8,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41234896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Synchronization of Coupled Neural Networks With Constant Time-Delay Using Sampled-Data Information","authors":"Xiang Liu;Siqin Liao;Zheng-Guang Wu;Yuanqing Wu","doi":"10.1109/TCYB.2023.3318987","DOIUrl":"10.1109/TCYB.2023.3318987","url":null,"abstract":"In this article, a synchronization control method is studied for coupled neural networks (CNNs) with constant time delay using sampled-data information. A distributed control protocol relying on the sampled-data information of neighboring nodes is proposed. Lyapunov functional is constructed to analyze the synchronization of CNNs with constant time delay. Using Park’s integral inequality and improved free-weight matrix integral inequality, sufficient conditions are provided for CNNs to achieve synchronization with less conservatism. In addition, the maximum sampling interval is determined by transforming the sufficient conditions into an optimization problem, and an aperiodic sampling control technique is implemented to reduce the communication energy load. Finally, numerical simulations are provided to demonstrate that the proposed method is capable of achieving synchronization.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"54 8","pages":"4702-4711"},"PeriodicalIF":9.4,"publicationDate":"2023-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41199450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scaled Position Consensus of High-Order Uncertain Multiagent Systems Over Switching Directed Graphs","authors":"Jie Mei;Kaixin Tian;Guangfu Ma","doi":"10.1109/TCYB.2023.3312696","DOIUrl":"10.1109/TCYB.2023.3312696","url":null,"abstract":"We investigate the scaled position consensus of high-order multiagent systems with parametric uncertainties over switching directed graphs, where the agents’ position states reach a consensus value with different scales. The intricacy arises from the asymmetry inherent in information interaction. Achieving scaled position consensus in high-order multiagent systems over directed graphs remains a significant challenge, particularly when confronted with the following complex features: 1) uniformly jointly connected switching directed graphs; 2) complex agent dynamics with unknown inertias, unknown control directions, parametric uncertainties, and external disturbances; 3) interacting with each other via only relative scaled position information (without high-order derivatives of relative position); and 4) fully distributed in terms of no shared gains and no global gain dependency. To address these challenges, we propose a distributed adaptive algorithm based on a acrlong MRACon scheme, where a linear high-order reference model is designed for every individual agent employing relative scaled position information as input. A new transformation is proposed which converts the scaled position consensus of high-order linear reference models to that of first-order ones. Theoretical analysis is presented where agents’ positions achieve the scaled consensus over switching directed graphs. Numerical simulations are performed to validate the efficacy of our algorithm and some collective behaviors on traditional consensus, bipartite consensus, and cluster consensus are shown by precisely choosing the scales of the agents.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"54 5","pages":"3093-3104"},"PeriodicalIF":11.8,"publicationDate":"2023-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41199449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}