{"title":"FAST: Feature Aware Similarity Thresholding for Weak Unlearning in Black-Box Generative Models","authors":"Subhodip Panda;A.P. Prathosh","doi":"10.1109/TAI.2024.3499939","DOIUrl":"https://doi.org/10.1109/TAI.2024.3499939","url":null,"abstract":"The heightened emphasis on the regulation of deep generative models, propelled by escalating concerns pertaining to privacy and compliance with regulatory frameworks, underscores the imperative need for precise control mechanisms over these models. This urgency is particularly underscored by instances in which generative models generate outputs that encompass objectionable, offensive, or potentially injurious content. In response, <italic>machine unlearning</i> has emerged to selectively forget specific knowledge or remove the influence of undesirable data subsets from pretrained models. However, modern <italic>machine unlearning</i> approaches typically assume access to model parameters and architectural details during unlearning, which is not always feasible. In multitude of downstream tasks, these models function as black-box systems, with inaccessible pretrained parameters, architectures, and training data. In such scenarios, the possibility of filtering undesired outputs becomes a practical alternative. Our proposed method <italic>feature aware similarity thresholding (FAST)</i> effectively suppresses undesired outputs by systematically encoding the representation of unwanted features in the latent space. We employ user-marked positive and negative samples to guide this process, leveraging the latent space's inherent capacity to capture these undesired representations. During inference, we use this identified representation in the latent space to compute projection similarity metrics with newly sampled latent vectors. Subsequently, we meticulously apply a threshold to exclude undesirable samples from the output.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 4","pages":"885-895"},"PeriodicalIF":0.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143740314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiobjective Optimization for Traveling Salesman Problem: A Deep Reinforcement Learning Algorithm via Transfer Learning","authors":"Le-yang Gao;Rui Wang;Zhao-hong Jia;Chuang Liu","doi":"10.1109/TAI.2024.3499946","DOIUrl":"https://doi.org/10.1109/TAI.2024.3499946","url":null,"abstract":"A wide range of real applications can be modelled as the multiobjective traveling salesman problem (MOTSP), one of typical combinatorial optimization problems. Meta-heuristics can be used to address MOTSP. However, due to involving iteratively searching large solution space, they often entail significant computation time. Recently, deep reinforcement learning (DRL) algorithms have been employed in generating approximate optimal solutions to the single objective traveling salesman problems, as well as MOTSPs. This study proposes a multiobjective optimization algorithm based on DRL, called multiobjective pointer network (MOPN), where the input structure of the pointer network is redesigned to be applied to MOTSP. Furthermore, a training strategy utilizing a representative model and transfer learning is introduced to enhance the performance of MOPN. The proposed MOPN is insensitive to problem scale, meaning that a trained MOPN can address MOTSPs with different scales. Compared to meta-heuristics, MOPN takes much less time on forward propagation to obtain the pareto front. To verify the performance of our model, extensive experiments are conducted on three different MOTSPs to compare the MOPN with two state-of-the-art DRL models and two multiobjective meta-heuristics. Experimental results demonstrate that the proposed MOPN obtains the best solution with the least training time among all the compared DRL methods.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 4","pages":"896-908"},"PeriodicalIF":0.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143740205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiacheng Yang;Yuanda Wang;Lu Dong;Lei Xue;Changyin Sun
{"title":"Active Robust Adversarial Reinforcement Learning Under Temporally Coupled Perturbations","authors":"Jiacheng Yang;Yuanda Wang;Lu Dong;Lei Xue;Changyin Sun","doi":"10.1109/TAI.2024.3499938","DOIUrl":"https://doi.org/10.1109/TAI.2024.3499938","url":null,"abstract":"Robust reinforcement learning (RL) aims to improve the generalization of agents under model mismatch. As a major branch of robust RL, adversarial approaches formulate the problem as a zero-sum game in which adversaries seek to apply worst case perturbations to the dynamics. However, the potential constraints of adversarial perturbations are seldom addressed in existing approaches. In this article, we consider temporally coupled settings, where adversarial perturbations change continuously at a bounded rate. This kind of constraint can commonly arise in a variety of real-world situations (e.g., changes in wind speed and ocean currents). We propose a novel robust RL approach, named active robust adversarial RL (ARA-RL), that tackles this problem in an adversarial architecture. First, we introduce a type of RL adversary that generates temporally coupled perturbations on agent actions. Then, we embed a diagnostic module in the RL agent, enabling it to actively detect temporally coupled perturbations in unseen environments. Through adversarial training, the agent seeks to maximize its worst case performance and thus achieve robustness under perturbations. Finally, extensive experiments demonstrate that our proposed approach provides significant robustness against temporally coupled perturbations and outperforms other baselines on several continuous control tasks.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 4","pages":"874-884"},"PeriodicalIF":0.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143740233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LoRaDIP: Low-Rank Adaptation With Deep Image Prior for Generative Low-Light Image Enhancement","authors":"Zunjin Zhao;Daming Shi","doi":"10.1109/TAI.2024.3499950","DOIUrl":"https://doi.org/10.1109/TAI.2024.3499950","url":null,"abstract":"This article presents LoRaDIP, a novel low-light image enhancement (LLIE) model based on deep image priors (DIPs). While DIP-based enhancement models are known for their zero-shot learning, their expensive computational cost remains a challenge. In addressing this issue, our proposed LoRaDIP introduces a low-rank adaptation technique, significantly reducing computational expenses without compromising performance. The contributions of this work are threefold. First, we eliminate the need for estimating initial illumination and reflectance, opting instead to directly estimate the illumination map from the observed image in a generative fashion. The illumination is parameterized by a DIP network. Second, considering the overparameterization of DIP networks, we introduce a low-rank adaptation technique to decrease the number of trainable parameters, thereby reducing computational demands. Third, differing from the existing DIP-based models that rely on a preset fixed number of iterations to halt the optimization process of Retinex decomposition, we propose an automatic stopping criterion based on stable rank, preventing unnecessary iterations. LoRaDIP not only inherits the advantage of requiring only the single input image but also exhibits reduced computational costs while maintaining or even surpassing the performance of state-of-the-art models.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 4","pages":"909-920"},"PeriodicalIF":0.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143740234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Neural Network Finite-Time Event Triggered Intelligent Control for Stochastic Nonlinear Systems With Time-Varying Constraints","authors":"Jia Liu;Jiapeng Liu;Qing-Guo Wang;Jinpeng Yu","doi":"10.1109/TAI.2024.3497913","DOIUrl":"https://doi.org/10.1109/TAI.2024.3497913","url":null,"abstract":"Finite-time command-filter event-trigger control based on adaptive neural network is presented in this article for a class of output-feedback stochastic nonlinear system (SNS) with output time-varying constraints and unmeasured states. The adaptive neural network combined with backstepping is utilized to approximate the unknown nonlinear functions of the system. The finite-time command-filter is employed to reduce the difficulty of complex calculation caused by backstepping technique. An adaptive observer is developed to estimate unmeasured states, and a controller is designed to be triggered only when the event-triggered condition is met. The time-varying barrier Lyapunov function is utilized to ensure the output time-varying constraint. The control method proposed in this article not only guarantees the finite-time stability of the system but also meets the output constraint. The effectiveness of the method is demonstrated on the ship maneuvering system with three degrees of freedom.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 3","pages":"773-779"},"PeriodicalIF":0.0,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143583201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiaqi Liu;Peng Hang;Xiaocong Zhao;Jianqiang Wang;Jian Sun
{"title":"DDM-Lag: A Diffusion-Based Decision-Making Model for Autonomous Vehicles With Lagrangian Safety Enhancement","authors":"Jiaqi Liu;Peng Hang;Xiaocong Zhao;Jianqiang Wang;Jian Sun","doi":"10.1109/TAI.2024.3497918","DOIUrl":"https://doi.org/10.1109/TAI.2024.3497918","url":null,"abstract":"Decision-making stands as a pivotal component in the realm of autonomous vehicles (AVs), playing a crucial role in navigating the intricacies of autonomous driving. Amid the evolving landscape of data-driven methodologies, enhancing decision-making performance in complex scenarios has emerged as a prominent research focus. Despite considerable advancements, current learning-based decision-making approaches exhibit potential for refinement, particularly in aspects of policy articulation and safety assurance. To address these challenges, we introduce DDM-Lag, a diffusion decision model, augmented with Lagrangian-based safety enhancements. This work conceptualizes the sequential decision-making challenge inherent in autonomous driving as a problem of generative modeling, adopting diffusion models as the medium for assimilating patterns of decision-making. We introduce a hybrid policy update strategy for diffusion models, amalgamating the principles of behavior cloning and Q-learning, alongside the formulation of an actor–critic architecture for the facilitation of updates. To augment the model's exploration process with a layer of safety, we incorporate additional safety constraints, employing a sophisticated policy optimization technique predicated on Lagrangian relaxation to refine the policy learning endeavor comprehensively. Empirical evaluation of our proposed decision-making methodology was conducted across a spectrum of driving tasks, distinguished by their varying degrees of complexity and environmental contexts. The comparative analysis with established baseline methodologies elucidates our model's superior performance, particularly in dimensions of safety and holistic efficacy.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 3","pages":"780-791"},"PeriodicalIF":0.0,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143583134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generation With Nuanced Changes: Continuous Image-to-Image Translation With Adversarial Preferences","authors":"Yinghua Yao;Yuangang Pan;Ivor W. Tsang;Xin Yao","doi":"10.1109/TAI.2024.3497915","DOIUrl":"https://doi.org/10.1109/TAI.2024.3497915","url":null,"abstract":"Most previous methods for continuous image-to-image translation resorted to binary attributes with restrictive description ability and thus cannot achieve satisfactory performance. Some works proposed to use fine-grained semantic information, <italic>relative attributes (RAs), preferences over pairs of images on the strength of a specified attribute</i>. However, they still failed to reconcile both goals for smooth translation and for high-quality generation simultaneously. In this work, we propose a new model continuous translation via adversarial preferences (CTAP) to coordinate these two goals for high-quality continuous translation based on RAs. In CTAP, we simultaneously train two modules: a generator that translates an input image to the desired image with smooth nuanced changes w.r.t. the interested attributes; and a ranker that executes adversarial preferences consisting of the input image and the desired image. Particularly, adversarial preferences involve an adversarial ranking process: 1) the ranker thinks no difference between the desired image and the input image in terms of the interested attributes; 2) the generator fools the ranker to believe the attributes of its output image changes as expect compared with the input image. RAs over pairs of real images are introduced to guide the ranker to rank image pairs regarding the interested attributes only. With an effective ranker, the generator would “win” the adversarial game by producing high-quality images that present smooth changes. The experiments on two face datasets and one shoe dataset demonstrate that our CTAP achieves state-of-art results in generating high-fidelity images which exhibit smooth changes over the interested attributes.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 4","pages":"816-828"},"PeriodicalIF":0.0,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143740313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ankita Chatterjee;Jayanta Mukherjee;Partha Pratim Das
{"title":"Analyzing Hierarchical Relationships and Quality of Embedding in Latent Space","authors":"Ankita Chatterjee;Jayanta Mukherjee;Partha Pratim Das","doi":"10.1109/TAI.2024.3497921","DOIUrl":"https://doi.org/10.1109/TAI.2024.3497921","url":null,"abstract":"Existing learning models partition the generated representations using hyperplanes which form well defined groups of similar embeddings that is uniquely mapped to a particular class. However, in practical applications, the embedding space does not form distinct boundaries to segregate the class representations. There exists interaction among similar classes which cannot be visually determined in high-dimensional space. Moreover, the structure of the latent space remains obscure. As learned representations are frequently reused to reduce the inference time, it is important to analyse how semantically related classes interact among themselves in the latent space. Therefore, we propose a boundary estimation algorithm that minimises the inclusion of other classes in the embedding space to form groups of similar representations and compare the quality of these class embeddings for various models in an already encoded space. These groups are overlapping to denote ambiguous embeddings that cannot be mapped to a particular class with high confidence. The algorithm determines which representations to be included or discarded to form well defined regions, separating discriminating, ambiguous and rejected embeddings to depict a particular class. Later, we construct relation trees to evaluate the hierarchical relationships formed among the classes, and compare it with the <italic>WordNet</i> ontology using phylogenetic tree comparison methods.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 4","pages":"843-858"},"PeriodicalIF":0.0,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143740097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HGFF: A Deep Reinforcement Learning Framework for Lifetime Maximization in Wireless Sensor Networks","authors":"Xiaoxu Han;Xin Mu;Jinghui Zhong","doi":"10.1109/TAI.2024.3497926","DOIUrl":"https://doi.org/10.1109/TAI.2024.3497926","url":null,"abstract":"Planning the movement of the sink to maximize the lifetime in wireless sensor networks (WSNs) is an essential problem. Many existing mobile sink techniques based on mathematical programming or heuristics have demonstrated the feasibility of the task. Nevertheless, the huge computational cost or the over-reliance on human knowledge can result in relatively low performance. To balance the need for high-quality solutions to minimize inference time, we propose a new framework to construct the movement path of the sink automatically. We cast the lifetime maximization problem as an optimization task within a heterogeneous graph and learn movement policy for the sink by combining graph neural network (GNN) with deep reinforcement learning. Our approach comprises three key modules: 1) a heterogeneous GNN to learn representations of sites and sensors by aggregating features of neighbor nodes and extracting hierarchical graph features; 2) a multihead attention mechanism that allows the sites to attend to information from sensor nodes, which highly improves the expressive capacity of the learning model; and 3) a greedy policy that learns to append the next best site in the solution incrementally. We design twelve types of static and dynamic maps to simulate different WSNs in the real world, and extensive experiments are conducted to evaluate and analyze our approach. The empirical results show that our approach consistently outperforms the existing methods on all types of maps. Notably, our approach significantly extends the simulation lifetime without sacrificing a large increase in inference time.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 4","pages":"859-873"},"PeriodicalIF":0.0,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143740220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Safe Multiagent Reinforcement Learning With Bilevel Optimization in Autonomous Driving","authors":"Zhi Zheng;Shangding Gu","doi":"10.1109/TAI.2024.3497919","DOIUrl":"https://doi.org/10.1109/TAI.2024.3497919","url":null,"abstract":"Ensuring safety in multiagent reinforcement learning (MARL), particularly when deploying it in real-world applications such as autonomous driving, emerges as a critical challenge. To address this challenge, traditional safe MARL methods extend MARL approaches to incorporate safety considerations, aiming to minimize safety risk values. However, these safe MARL algorithms often fail to model other agents and lack convergence guarantees, particularly in dynamically complex environments. In this study, we propose a safe MARL method grounded in a Stackelberg model with bilevel optimization, for which convergence analysis is provided. Derived from our theoretical analysis, we develop two practical algorithms, namely constrained Stackelberg Q-learning (CSQ) and constrained Stackelberg multiagent deep deterministic policy gradient (CS-MADDPG), designed to facilitate MARL decision-making in some simulated autonomous driving applications such as traffic management. To evaluate the effectiveness of our algorithms, we developed a safe MARL autonomous driving benchmark and conducted experiments on challenging autonomous driving scenarios, such as merges, roundabouts, intersections, and racetracks. The experimental results indicate that our algorithms, CSQ and CS-MADDPG, outperform several strong MARL baselines, such as Bi-AC, MACPO, and MAPPO-L, regarding reward and safety performance.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 4","pages":"829-842"},"PeriodicalIF":0.0,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143740204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}