{"title":"Curious Explorer: A Provable Exploration Strategy in Policy Learning","authors":"Marco Miani;Maurizio Parton;Marco Romito","doi":"10.1109/TPAMI.2024.3460972","DOIUrl":"10.1109/TPAMI.2024.3460972","url":null,"abstract":"A coverage assumption is critical with policy gradient methods, because while the objective function is insensitive to updates in unlikely states, the agent may need improvements in those states to reach a nearly optimal payoff. However, this assumption can be unfeasible in certain environments, for instance in online learning, or when restarts are possible only from a fixed initial state. In these cases, classical policy gradient algorithms like REINFORCE can have poor convergence properties and sample efficiency. Curious Explorer is an iterative state space pure exploration strategy improving coverage of any restart distribution \u0000<inline-formula><tex-math>$rho$</tex-math></inline-formula>\u0000. Using \u0000<inline-formula><tex-math>$rho$</tex-math></inline-formula>\u0000 and intrinsic rewards, Curious Explorer produces a sequence of policies, each one more exploratory than the previous one, and outputs a restart distribution with coverage based on the state visitation distribution of the exploratory policies. This paper main results are a theoretical upper bound on how often an optimal policy visits poorly visited states, and a bound on the error of the return obtained by REINFORCE without any coverage assumption. Finally, we conduct ablation studies with \u0000<monospace>REINFORCE</monospace>\u0000 and \u0000<monospace>TRPO</monospace>\u0000 in two hard-exploration tasks, to support the claim that Curious Explorer can improve the performance of very different policy gradient algorithms.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142245649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Federated Feature Augmentation and Alignment","authors":"Tianfei Zhou;Ye Yuan;Binglu Wang;Ender Konukoglu","doi":"10.1109/TPAMI.2024.3457751","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3457751","url":null,"abstract":"Federated learning is a distributed paradigm that allows multiple parties to collaboratively train deep learning models without direct exchange of raw data. Nevertheless, the inherent non-independent and identically distributed (non-i.i.d.) nature of data distribution among clients results in significant degradation of the acquired model. The primary goal of this study is to develop a robust federated learning algorithm to address \u0000<i>feature shift</i>\u0000 in clients’ samples, potentially arising from a range of factors such as acquisition discrepancies in medical imaging. To reach this goal, we first propose federated feature augmentation (\u0000<small>FedFA</small>\u0000<inline-formula><tex-math>$^{l}$</tex-math></inline-formula>\u0000), a novel feature augmentation technique tailored for federated learning. \u0000<small>FedFA</small>\u0000<inline-formula><tex-math>$^{l}$</tex-math></inline-formula>\u0000 is based on a crucial insight that each client's data distribution can be characterized by first-/second-order statistics (\u0000<i>a.k.a.</i>\u0000, mean and standard deviation) of latent features; and it is feasible to manipulate these local statistics \u0000<i>globally</i>\u0000, i.e., based on information in the entire federation, to let clients have a better sense of the global distribution across clients. Grounded on this insight, we propose to augment each local feature statistic based on a normal distribution, wherein the mean corresponds to the original statistic, and the variance defines the augmentation scope. Central to \u0000<small>FedFA</small>\u0000<inline-formula><tex-math>$^{l}$</tex-math></inline-formula>\u0000 is the determination of a meaningful Gaussian variance, which is accomplished by taking into account not only biased data of each individual client, but also underlying feature statistics represented by all participating clients. Beyond consideration of \u0000<i>low-order</i>\u0000 statistics in \u0000<small>FedFA</small>\u0000<inline-formula><tex-math>$^{l}$</tex-math></inline-formula>\u0000, we propose a federated feature alignment component (\u0000<small>FedFA</small>\u0000<inline-formula><tex-math>$^{h}$</tex-math></inline-formula>\u0000) that exploits \u0000<i>higher-order</i>\u0000 feature statistics to gain a more detailed understanding of local feature distribution and enables explicit alignment of augmented features in different clients to promote more consistent feature learning. Combining \u0000<small>FedFA</small>\u0000<inline-formula><tex-math>$^{l}$</tex-math></inline-formula>\u0000 and \u0000<small>FedFA</small>\u0000<inline-formula><tex-math>$^{h}$</tex-math></inline-formula>\u0000 yields our full approach \u0000<small><b>FedFA<inline-formula><tex-math>$+$</tex-math><alternatives><mml:math><mml:mo>+</mml:mo></mml:math><inline-graphic></alternatives></inline-formula></b></small>\u0000. \u0000<small>FedFA<inline-formula><tex-math>$+$</tex-math><alternatives><mml:math><mml:mo>+</mml:mo></mml:math><inline-graphic></alternatives></inline-formula></small>\u0000 is non-parametric, incurs negligible additional communication costs, and can be seamlessly incorporated into popul","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142598608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenyu You;Weicheng Dai;Fenglin Liu;Yifei Min;Nicha C. Dvornek;Xiaoxiao Li;David A. Clifton;Lawrence Staib;James S. Duncan
{"title":"Mine yOur owN Anatomy: Revisiting Medical Image Segmentation With Extremely Limited Labels","authors":"Chenyu You;Weicheng Dai;Fenglin Liu;Yifei Min;Nicha C. Dvornek;Xiaoxiao Li;David A. Clifton;Lawrence Staib;James S. Duncan","doi":"10.1109/TPAMI.2024.3461321","DOIUrl":"10.1109/TPAMI.2024.3461321","url":null,"abstract":"Recent studies on contrastive learning have achieved remarkable performance solely by leveraging few labels in medical image segmentation. Existing methods mainly focus on instance discrimination and invariant mapping. However, they face three common pitfalls: (1) tailness: medical image data usually follows an implicit long-tail class distribution. Blindly leveraging all pixels in training hence can lead to the data imbalance issues, and cause deteriorated performance; (2) consistency: it remains unclear whether a segmentation model has learned meaningful and yet consistent anatomical features due to the intra-class variations between different anatomical features; and (3) diversity: the intra-slice correlations within the entire dataset have received significantly less attention. This motivates us to seek a principled approach for strategically making use of the dataset itself to discover similar yet distinct samples from different anatomical views. In this paper, we introduce a novel semi-supervised medical image segmentation framework termed Mine y\u0000<bold>O</b>\u0000ur ow\u0000<bold>N</b>\u0000 Anatomy (\u0000<sc>MONA</small>\u0000), and make three contributions. First, prior work argues that every pixel equally matters to the training; we observe empirically that this alone is unlikely to define meaningful anatomical features, mainly due to lacking the supervision signal. We show two simple solutions towards learning invariances. Second, we construct a set of objectives that encourage the model to be capable of decomposing medical images into a collection of anatomical features in an unsupervised manner. Lastly, we both empirically and theoretically, demonstrate the efficacy of our \u0000<sc>MONA</small>\u0000 on three benchmark datasets with different labeled settings, achieving new state-of-the-art under different labeled semi-supervised settings.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142231578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meng-Hao Guo;Yi Zhang;Tai-Jiang Mu;Sharon X. Huang;Shi-Min Hu
{"title":"Tuning Vision-Language Models With Multiple Prototypes Clustering","authors":"Meng-Hao Guo;Yi Zhang;Tai-Jiang Mu;Sharon X. Huang;Shi-Min Hu","doi":"10.1109/TPAMI.2024.3460180","DOIUrl":"10.1109/TPAMI.2024.3460180","url":null,"abstract":"Benefiting from advances in large-scale pre-training, foundation models, have demonstrated remarkable capability in the fields of natural language processing, computer vision, among others. However, to achieve expert-level performance in specific applications, such models often need to be fine-tuned with domain-specific knowledge. In this paper, we focus on enabling vision-language models to unleash more potential for visual understanding tasks under few-shot tuning. Specifically, we propose a novel adapter, dubbed as lusterAdapter, which is based on trainable multiple prototypes clustering algorithm, for tuning the CLIP model. It can not only alleviate the concern of catastrophic forgetting of foundation models by introducing anchors to inherit common knowledge, but also improve the utilization efficiency of few annotated samples via bringing in clustering and domain priors, thereby improving the performance of few-shot tuning. We have conducted extensive experiments on 11 common classification benchmarks. The results show our method significantly surpasses the original CLIP and achieves state-of-the-art (SOTA) performance under all benchmarks and settings. For example, under the 16-shot setting, our method exhibits a remarkable improvement over the original CLIP by 19.6%, and also surpasses TIP-Adapter and GraphAdapter by 2.7% and 2.2%, respectively, in terms of average accuracy across the 11 benchmarks.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142231615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low-Dimensional Gradient Helps Out-of-Distribution Detection","authors":"Yingwen Wu;Tao Li;Xinwen Cheng;Jie Yang;Xiaolin Huang","doi":"10.1109/TPAMI.2024.3459988","DOIUrl":"10.1109/TPAMI.2024.3459988","url":null,"abstract":"Detecting out-of-distribution (OOD) samples is essential for ensuring the reliability of deep neural networks (DNNs) in real-world scenarios. While previous research has predominantly investigated the disparity between in-distribution (ID) and OOD data through forward information analysis, the discrepancy in parameter gradients during the backward process of DNNs has received insufficient attention. Existing studies on gradient disparities mainly focus on the utilization of gradient norms, neglecting the wealth of information embedded in gradient directions. To bridge this gap, in this paper, we conduct a comprehensive investigation into leveraging the entirety of gradient information for OOD detection. The primary challenge arises from the high dimensionality of gradients due to the large number of network parameters. To solve this problem, we propose performing linear dimension reduction on the gradient using a designated subspace that comprises principal components. This innovative technique enables us to obtain a low-dimensional representation of the gradient with minimal information loss. Subsequently, by integrating the reduced gradient with various existing detection score functions, our approach demonstrates superior performance across a wide range of detection tasks. For instance, on the ImageNet benchmark with ResNet50 model, our method achieves an average reduction of 11.15\u0000<inline-formula><tex-math>$%$</tex-math></inline-formula>\u0000 in the false positive rate at 95\u0000<inline-formula><tex-math>$%$</tex-math></inline-formula>\u0000 recall (FPR95) compared to the current state-of-the-art approach.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142174688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ES-GNN: Generalizing Graph Neural Networks Beyond Homophily With Edge Splitting","authors":"Jingwei Guo;Kaizhu Huang;Rui Zhang;Xinping Yi","doi":"10.1109/TPAMI.2024.3459932","DOIUrl":"10.1109/TPAMI.2024.3459932","url":null,"abstract":"While Graph Neural Networks (GNNs) have achieved enormous success in multiple graph analytical tasks, modern variants mostly rely on the strong inductive bias of homophily. However, real-world networks typically exhibit both homophilic and heterophilic linking patterns, wherein adjacent nodes may share dissimilar attributes and distinct labels. Therefore, GNNs smoothing node proximity holistically may aggregate both task-relevant and irrelevant (even harmful) information, limiting their ability to generalize to heterophilic graphs and potentially causing non-robustness. In this work, we propose a novel Edge Splitting GNN (ES-GNN) framework to adaptively distinguish between graph edges either relevant or irrelevant to learning tasks. This essentially transfers the original graph into two subgraphs with the same node set but complementary edge sets dynamically. Given that, information propagation separately on these subgraphs and edge splitting are alternatively conducted, thus disentangling the task-relevant and irrelevant features. Theoretically, we show that our ES-GNN can be regarded as a solution to a \u0000<italic>disentangled graph denoising problem</i>\u0000, which further illustrates our motivations and interprets the improved generalization beyond homophily. Extensive experiments over 11 benchmark and 1 synthetic datasets not only demonstrate the effective performance of ES-GNN but also highlight its robustness to adversarial graphs and mitigation of the over-smoothing problem.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142174687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liyuan Pan;Richard Hartley;Liu Liu;Zhiwei Xu;Shah Chowdhury;Yan Yang;Hongguang Zhang;Hongdong Li;Miaomiao Liu
{"title":"Weakly-Supervised Depth Estimation and Image Deblurring via Dual-Pixel Sensors","authors":"Liyuan Pan;Richard Hartley;Liu Liu;Zhiwei Xu;Shah Chowdhury;Yan Yang;Hongguang Zhang;Hongdong Li;Miaomiao Liu","doi":"10.1109/TPAMI.2024.3458974","DOIUrl":"10.1109/TPAMI.2024.3458974","url":null,"abstract":"Dual-pixel (DP) imaging sensors are getting more popularly adopted by modern cameras. A DP camera captures a pair of images in a single snapshot by splitting each pixel in half. Several previous studies show how to recover depth information by treating the DP pair as an approximate stereo pair. However, dual-pixel disparity occurs only in image regions with defocus blur which is unlike classic stereo disparity. Heavy defocus blur in DP pairs affects the performance of depth estimation approaches based on matching. Therefore, we treat the blur removal and the depth estimation as a joint problem. We investigate the formation of the DP pair, which links the blur and depth information, rather than blindly removing the blur effect. We propose a mathematical DP model that can improve depth estimation by the blur. This exploration motivated us to propose our previous work, an end-to-end DDDNet (DP-based Depth and Deblur Network), which jointly estimates depth and restores the image in a supervised fashion. However, collecting the ground-truth (GT) depth map for the DP pair is challenging and limits the depth estimation potential of the DP sensor. Therefore, we propose an extension of the DDDNet, called WDDNet (Weakly-supervised Depth and Deblur Network), which includes an efficient reblur solver that does not require GT depth maps for training. To achieve this, we convert all-in-focus images into supervisory signals for unsupervised depth estimation in our WDDNet. We jointly estimate an all-in-focus image and a disparity map, then use a \u0000<italic>Reblur</i>\u0000 and \u0000<italic>Fstack</i>\u0000 module to regularize the disparity estimation and image restoration. We conducted extensive experiments on synthetic and real data to demonstrate the competitive performance of our method when compared to state-of-the-art (SOTA) supervised approaches.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142174686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Label Deconvolution for Node Representation Learning on Large-Scale Attributed Graphs Against Learning Bias","authors":"Zhihao Shi;Jie Wang;Fanghua Lu;Hanzhu Chen;Defu Lian;Zheng Wang;Jieping Ye;Feng Wu","doi":"10.1109/TPAMI.2024.3459408","DOIUrl":"10.1109/TPAMI.2024.3459408","url":null,"abstract":"Node representation learning on attributed graphs—whose nodes are associated with rich attributes (e.g., texts and protein sequences)—plays a crucial role in many important downstream tasks. To encode the attributes and graph structures simultaneously, recent studies integrate pre-trained models with graph neural networks (GNNs), where pre-trained models serve as node encoders (NEs) to encode the attributes. As jointly training large NEs and GNNs on large-scale graphs suffers from severe scalability issues, many methods propose to train NEs and GNNs separately. Consequently, they do not take feature convolutions in GNNs into consideration in the training phase of NEs, leading to a significant learning bias relative to the joint training. To address this challenge, we propose an efficient label regularization technique, namely \u0000<bold>L</b>\u0000abel \u0000<bold>D</b>\u0000econvolution (LD), to alleviate the learning bias by a novel and highly scalable approximation to the inverse mapping of GNNs. The inverse mapping leads to an objective function that is equivalent to that by the joint training, while it can effectively incorporate GNNs in the training phase of NEs against the learning bias. More importantly, we show that LD converges to the optimal objective function values by the joint training under mild assumptions. Experiments demonstrate LD significantly outperforms state-of-the-art methods on Open Graph Benchmark datasets.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142174690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiao Bai;Pengcheng Zhang;Xiaohan Yu;Jin Zheng;Edwin R. Hancock;Jun Zhou;Lin Gu
{"title":"Learning From Human Attention for Attribute-Assisted Visual Recognition","authors":"Xiao Bai;Pengcheng Zhang;Xiaohan Yu;Jin Zheng;Edwin R. Hancock;Jun Zhou;Lin Gu","doi":"10.1109/TPAMI.2024.3458921","DOIUrl":"10.1109/TPAMI.2024.3458921","url":null,"abstract":"With prior knowledge of seen objects, humans have a remarkable ability to recognize novel objects using shared and distinct local attributes. This is significant for the challenging tasks of zero-shot learning (ZSL) and fine-grained visual classification (FGVC), where the discriminative attributes of objects have played an important role. Inspired by human visual attention, neural networks have widely exploited the attention mechanism to learn the locally discriminative attributes for challenging tasks. Though greatly promoted the development of these fields, existing works mainly focus on learning the region embeddings of different attribute features and neglect the importance of discriminative attribute localization. It is also unclear whether the learned attention truly matches the real human attention. To tackle this problem, this paper proposes to employ real human gaze data for visual recognition networks to learn from human attention. Specifically, we design a unified Attribute Attention Network (A\u0000<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>\u0000Net) that learns from human attention for both ZSL and FGVC tasks. The overall model consists of an attribute attention branch and a baseline classification network. On top of the image feature maps provided by the baseline classification network, the attribute attention branch employs attribute prototypes to produce attribute attention maps and attribute features. The attribute attention maps are converted to gaze-like attentions to be aligned with real human gaze attention. To guarantee the effectiveness of attribute feature learning, we further align the extracted attribute features with attribute-defined class embeddings. To facilitate learning from human gaze attention for the visual recognition problems, we design a bird classification game to collect real human gaze data using the CUB dataset via an eye-tracker device. Experiments on ZSL and FGVC tasks without/with real human gaze data validate the benefits and accuracy of our proposed model. This work supports the promising benefits of collecting human gaze datasets and automatic gaze estimation algorithms learning from human attention for high-level computer vision tasks.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142171000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Review of Safe Reinforcement Learning: Methods, Theories, and Applications","authors":"Shangding Gu;Long Yang;Yali Du;Guang Chen;Florian Walter;Jun Wang;Alois Knoll","doi":"10.1109/TPAMI.2024.3457538","DOIUrl":"10.1109/TPAMI.2024.3457538","url":null,"abstract":"Reinforcement Learning (RL) has achieved tremendous success in many complex decision-making tasks. However, safety concerns are raised during deploying RL in real-world applications, leading to a growing demand for safe RL algorithms, such as in autonomous driving and robotics scenarios. While safe control has a long history, the study of safe RL algorithms is still in the early stages. To establish a good foundation for future safe RL research, in this paper, we provide a review of safe RL from the perspectives of methods, theories, and applications. First, we review the progress of safe RL from five dimensions and come up with five crucial problems for safe RL being deployed in real-world applications, coined as \u0000<italic>“2H3W”</i>\u0000. Second, we analyze the algorithm and theory progress from the perspectives of answering the \u0000<italic>“2H3W”</i>\u0000 problems. Particularly, the sample complexity of safe RL algorithms is reviewed and discussed, followed by an introduction to the applications and benchmarks of safe RL algorithms. Finally, we open the discussion of the challenging problems in safe RL, hoping to inspire future research on this thread. To advance the study of safe RL algorithms, we release an open-sourced repository containing major safe RL algorithms at the link.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142166413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}