{"title":"A Transfer Learning Framework for Deep Multi-Agent Reinforcement Learning","authors":"Yi Liu;Xiang Wu;Yuming Bo;Jiacun Wang;Lifeng Ma","doi":"10.1109/JAS.2023.124173","DOIUrl":"https://doi.org/10.1109/JAS.2023.124173","url":null,"abstract":"Dear Editor, This letter presents a new transfer learning framework for the deep multi-agent reinforcement learning (DMARL) to reduce the convergence difficulty and training time when applying DMARL to a new scenario [1], [2]. The proposed transfer learning framework includes the design of neural network architecture, curriculum transfer learning (CTL) and strategy distillation. Experimental results demonstrate that our framework enables DMARL models to converge faster while improving the final performance.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"11 11","pages":"2346-2348"},"PeriodicalIF":15.3,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10707687","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142397461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prediction-Based State Estimation and Compensation Control for Networked Systems with Communication Constraints and DoS Attacks","authors":"Zhong-Hua Pang;Qian Cao;Haibin Guo;Zhe Dong","doi":"10.1109/JAS.2024.124605","DOIUrl":"https://doi.org/10.1109/JAS.2024.124605","url":null,"abstract":"Dear Editor, This letter investigates the output tracking control issue of networked control systems (NCSs) with communication constraints and denial-of-service (DoS) attacks in the sensor-to-controller channel, both of which would induce random network delays. A dual-prediction-based compensation control (DPCC) scheme, consisting of a predictive observer and a predictive controller, is proposed to actively compensate for the adverse effect of network delays on NCSs. Compared with existing networked predictive control (NPC) methods, the DPCC scheme only requires the sensor to send a single measurement output to the controller at each sampling instant, and also does not need to know the upper bound of random network delays in advance. The stability condition of the closed-loop system is derived. Finally, numerical simulations are carried out to validate the effectiveness of the proposed scheme.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"11 11","pages":"2352-2354"},"PeriodicalIF":15.3,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10707690","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142397480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kui Jiang;Ruoxi Wang;Yi Xiao;Junjun Jiang;Xin Xu;Tao Lu
{"title":"Image Enhancement via Associated Perturbation Removal and Texture Reconstruction Learning","authors":"Kui Jiang;Ruoxi Wang;Yi Xiao;Junjun Jiang;Xin Xu;Tao Lu","doi":"10.1109/JAS.2024.124521","DOIUrl":"https://doi.org/10.1109/JAS.2024.124521","url":null,"abstract":"Degradation under challenging conditions such as rain, haze, and low light not only diminishes content visibility, but also results in additional degradation side effects, including detail occlusion and color distortion. However, current technologies have barely explored the correlation between perturbation removal and background restoration, consequently struggling to generate high-naturalness content in challenging scenarios. In this paper, we rethink the image enhancement task from the perspective of joint optimization: Perturbation removal and texture reconstruction. To this end, we advise an efficient yet effective image enhancement model, termed the perturbation-guided texture reconstruction network (PerTeRNet). It contains two sub-networks designed for the perturbation elimination and texture reconstruction tasks, respectively. To facilitate texture recovery, we develop a novel perturbation-guided texture enhancement module (PerTEM) to connect these two tasks, where informative background features are extracted from the input with the guidance of predicted perturbation priors. To alleviate the learning burden and computational cost, we suggest performing perturbation removal in a sub-space and exploiting super-resolution to infer high-frequency background details. Our PerTeRNet has demonstrated significant superiority over typical methods in both quantitative and qualitative measures, as evidenced by extensive experimental results on popular image enhancement and joint detection tasks. The source code is available at https://github.com/kuijiang94/PerTeRNet.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"11 11","pages":"2253-2269"},"PeriodicalIF":15.3,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142397482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Revisiting the LQR Problem of Singular Systems","authors":"Komeil Nosrati;Juri Belikov;Aleksei Tepljakov;Eduard Petlenkov","doi":"10.1109/JAS.2024.124665","DOIUrl":"https://doi.org/10.1109/JAS.2024.124665","url":null,"abstract":"In the development of linear quadratic regulator (LQR) algorithms, the Riccati equation approach offers two important characteristics —it is recursive and readily meets the existence condition. However, these attributes are applicable only to transformed singular systems, and the efficiency of the regulator may be undermined if constraints are violated in nonsingular versions. To address this gap, we introduce a direct approach to the LQR problem for linear singular systems, avoiding the need for any transformations and eliminating the need for regularity assumptions. To achieve this goal, we begin by formulating a quadratic cost function to derive the LQR algorithm through a penalized and weighted regression framework and then connect it to a constrained minimization problem using the Bellman's criterion. Then, we employ a dynamic programming strategy in a backward approach within a finite horizon to develop an LQR algorithm for the original system. To accomplish this, we address the stability and convergence analysis under the reachability and observability assumptions of a hypothetical system constructed by the pencil of augmented matrices and connected using the Hamiltonian diagonalization technique.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"11 11","pages":"2236-2252"},"PeriodicalIF":15.3,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142397481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Probabilistic Automata-Based Method for Enhancing Performance of Deep Reinforcement Learning Systems","authors":"Min Yang;Guanjun Liu;Ziyuan Zhou;Jiacun Wang","doi":"10.1109/JAS.2024.124818","DOIUrl":"https://doi.org/10.1109/JAS.2024.124818","url":null,"abstract":"Deep reinforcement learning (DRL) has demonstrated significant potential in industrial manufacturing domains such as workshop scheduling and energy system management. However, due to the model's inherent uncertainty, rigorous validation is requisite for its application in real-world tasks. Specific tests may reveal inadequacies in the performance of pre-trained DRL models, while the “black-box” nature of DRL poses a challenge for testing model behavior. We propose a novel performance improvement framework based on probabilistic automata, which aims to proactively identify and correct critical vulnerabilities of DRL systems, so that the performance of DRL models in real tasks can be improved with minimal model modifications. First, a probabilistic automaton is constructed from the historical trajectory of the DRL system by abstracting the state to generate probabilistic decision-making units (PDMUs), and a reverse breadth-first search (BFS) method is used to identify the key PDMU-action pairs that have the greatest impact on adverse outcomes. This process relies only on the state-action sequence and final result of each trajectory. Then, under the key PDMU, we search for the new action that has the greatest impact on favorable results. Finally, the key PDMU, undesirable action and new action are encapsulated as monitors to guide the DRL system to obtain more favorable results through real-time monitoring and correction mechanisms. Evaluations in two standard reinforcement learning environments and three actual job scheduling scenarios confirmed the effectiveness of the method, providing certain guarantees for the deployment of DRL models in real-world applications.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"11 11","pages":"2327-2339"},"PeriodicalIF":15.3,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142397369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuanxin Lin;Zhiwen Yu;Kaixiang Yang;Ziwei Fan;C. L. Philip Chen
{"title":"Boosting Adaptive Weighted Broad Learning System for Multi-Label Learning","authors":"Yuanxin Lin;Zhiwen Yu;Kaixiang Yang;Ziwei Fan;C. L. Philip Chen","doi":"10.1109/JAS.2024.124557","DOIUrl":"https://doi.org/10.1109/JAS.2024.124557","url":null,"abstract":"Multi-label classification is a challenging problem that has attracted significant attention from researchers, particularly in the domain of image and text attribute annotation. However, multi-label datasets are prone to serious intra-class and inter-class imbalance problems, which can significantly degrade the classification performance. To address the above issues, we propose the multi-label weighted broad learning system (MLW-BLS) from the perspective of label imbalance weighting and label correlation mining. Further, we propose the multi-label adaptive weighted broad learning system (MLAW-BLS) to adaptively adjust the specific weights and values of labels of MLW-BLS and construct an efficient imbalanced classifier set. Extensive experiments are conducted on various datasets to evaluate the effectiveness of the proposed model, and the results demonstrate its superiority over other advanced approaches.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"11 11","pages":"2204-2219"},"PeriodicalIF":15.3,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142397484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A State-Migration Particle Swarm Optimizer for Adaptive Latent Factor Analysis of High-Dimensional and Incomplete Data","authors":"Jiufang Chen;Kechen Liu;Xin Luo;Ye Yuan;Khaled Sedraoui;Yusuf Al-Turki;MengChu Zhou","doi":"10.1109/JAS.2024.124575","DOIUrl":"https://doi.org/10.1109/JAS.2024.124575","url":null,"abstract":"High-dimensional and incomplete (HDI) matrices are primarily generated in all kinds of big-data-related practical applications. A latent factor analysis (LFA) model is capable of conducting efficient representation learning to an HDI matrix, whose hyper-parameter adaptation can be implemented through a particle swarm optimizer (PSO) to meet scalable requirements. However, conventional PSO is limited by its premature issues, which leads to the accuracy loss of a resultant LFA model. To address this thorny issue, this study merges the information of each particle's state migration into its evolution process following the principle of a generalized momentum method for improving its search ability, thereby building a state-migration particle swarm optimizer (SPSO), whose theoretical convergence is rigorously proved in this study. It is then incorporated into an LFA model for implementing efficient hyper-parameter adaptation without accuracy loss. Experiments on six HDI matrices indicate that an SPSO-incorporated LFA model outperforms state-of-the-art LFA models in terms of prediction accuracy for missing data of an HDI matrix with competitive computational efficiency. Hence, SPSO's use ensures efficient and reliable hyper-parameter adaptation in an LFA model, thus ensuring practicality and accurate representation learning for HDI matrices.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"11 11","pages":"2220-2235"},"PeriodicalIF":15.3,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142397487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two-Stage Approach for Targeted Knowledge Transfer in Self-Knowledge Distillation","authors":"Zimo Yin;Jian Pu;Yijie Zhou;Xiangyang Xue","doi":"10.1109/JAS.2024.124629","DOIUrl":"https://doi.org/10.1109/JAS.2024.124629","url":null,"abstract":"Knowledge distillation (KD) enhances student network generalization by transferring dark knowledge from a complex teacher network. To optimize computational expenditure and memory utilization, self-knowledge distillation (SKD) extracts dark knowledge from the model itself rather than an external teacher network. However, previous SKD methods performed distillation indiscriminately on full datasets, overlooking the analysis of representative samples. In this work, we present a novel two-stage approach to providing targeted knowledge on specific samples, named two-stage approach self-knowledge distillation (TOAST). We first soften the hard targets using class medoids generated based on logit vectors per class. Then, we iteratively distill the under-trained data with past predictions of half the batch size. The two-stage knowledge is linearly combined, efficiently enhancing model performance. Extensive experiments conducted on five backbone architectures show our method is model-agnostic and achieves the best generalization performance. Besides, TOAST is strongly compatible with existing augmentation-based regularization methods. Our method also obtains a speedup of up to 2.95x compared with a recent state-of-the-art method.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"11 11","pages":"2270-2283"},"PeriodicalIF":15.3,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142397459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-USV Formation Collision Avoidance via Deep Reinforcement Learning and COLREGs","authors":"Cheng-Cheng Wang;Yu-Long Wang;Li Jia","doi":"10.1109/JAS.2023.123846","DOIUrl":"https://doi.org/10.1109/JAS.2023.123846","url":null,"abstract":"Dear Editor, This letter focuses on the collision avoidance for a multi-unmanned surface vehicle (multi-USV) system. A novel multi-USV collision avoidance (MUCA) algorithm is proposed. Firstly, in order to get a more reasonable collision avoidance policy, reward functions are constructed according to international regulations for preventing col-lisions at sea (COLREGS) and USV dynamics. Secondly, to reduce data noises and the impacts of outliers, an improved normalization method is proposed. States and rewards of USVs are normalized to avoid gradient vanishing and exploding. Thirdly, a novel \u0000<tex>$epsilon$</tex>\u0000-greedy method is proposed to help the optimal policy converge faster. It is easier for USVs to explore the optimal policy in the learning process. Finally, the proposed MUCA algorithm is tested in a multi-encounter situation including head-on, crossing, and overtaking. The experimental results demonstrate that the newly proposed MUCA algorithm can provide a collision-free marching policy for the USVs in formation.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"11 11","pages":"2349-2351"},"PeriodicalIF":15.3,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10707649","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142397479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhongyuan Zhao;Zhiqiang Yang;Luyao Jiang;Ju Yang;Quanbo Ge
{"title":"Privacy Preserving Distributed Bandit Residual Feedback Online Optimization Over Time-Varying Unbalanced Graphs","authors":"Zhongyuan Zhao;Zhiqiang Yang;Luyao Jiang;Ju Yang;Quanbo Ge","doi":"10.1109/JAS.2024.124656","DOIUrl":"https://doi.org/10.1109/JAS.2024.124656","url":null,"abstract":"This paper considers the distributed online optimization (DOO) problem over time-varying unbalanced networks, where gradient information is explicitly unknown. To address this issue, a privacy-preserving distributed online one-point residual feedback (OPRF) optimization algorithm is proposed. This algorithm updates decision variables by leveraging one-point residual feedback to estimate the true gradient information. It can achieve the same performance as the two-point feedback scheme while only requiring a single function value query per iteration. Additionally, it effectively eliminates the effect of time-varying unbalanced graphs by dynamically constructing row stochastic matrices. Furthermore, compared to other distributed optimization algorithms that only consider explicitly unknown cost functions, this paper also addresses the issue of privacy information leakage of nodes. Theoretical analysis demonstrate that the method attains sublinear regret while protecting the privacy information of agents. Finally, numerical experiments on distributed collaborative localization problem and federated learning confirm the effectiveness of the algorithm.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"11 11","pages":"2284-2297"},"PeriodicalIF":15.3,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142397368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}