{"title":"BEVHeight++: Toward Robust Visual Centric 3D Object Detection","authors":"Lei Yang;Tao Tang;Jun Li;Kun Yuan;Kai Wu;Peng Chen;Li Wang;Yi Huang;Lei Li;Xinyu Zhang;Kaicheng Yu","doi":"10.1109/TPAMI.2025.3549711","DOIUrl":"10.1109/TPAMI.2025.3549711","url":null,"abstract":"While most recent autonomous driving system focuses on developing perception methods on ego-vehicle sensors, people tend to overlook an alternative approach to leverage intelligent roadside cameras to extend the perception ability beyond the visual range. We discover that the state-of-the-art vision-centric detection methods perform poorly on roadside cameras. This is because these methods mainly focus on recovering the depth regarding the camera center, where the depth difference between the car and the ground quickly shrinks while the distance increases. In this paper, we propose a simple yet effective approach, dubbed BEVHeight++, to address this issue. In essence, we regress the height to the ground to achieve a distance-agnostic formulation to ease the optimization process of camera-only perception methods. By incorporating both height and depth encoding techniques, we achieve a more accurate and robust projection from 2D to BEV spaces. On popular 3D detection benchmarks of roadside cameras, our method surpasses all previous vision-centric methods by a significant margin. In terms of the ego-vehicle scenario, BEVHeight++ surpasses depth-only methods with increases of +2.8% NDS and +1.7% mAP on the nuScenes test set, and even higher gains of +9.3% NDS and +8.8% mAP on the nuScenes-C benchmark with object-level distortion. Consistent and substantial performance improvements are achieved across the KITTI, KITTI-360, and Waymo datasets as well.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 6","pages":"5094-5111"},"PeriodicalIF":0.0,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143598814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ziyuan Luo;Anderson Rocha;Boxin Shi;Qing Guo;Haoliang Li;Renjie Wan
{"title":"The NeRF Signature: Codebook-Aided Watermarking for Neural Radiance Fields","authors":"Ziyuan Luo;Anderson Rocha;Boxin Shi;Qing Guo;Haoliang Li;Renjie Wan","doi":"10.1109/TPAMI.2025.3550166","DOIUrl":"10.1109/TPAMI.2025.3550166","url":null,"abstract":"Neural Radiance Fields (NeRF) have been gaining attention as a significant form of 3D content representation. With the proliferation of NeRF-based creations, the need for copyright protection has emerged as a critical issue. Although some approaches have been proposed to embed digital watermarks into NeRF, they often neglect essential model-level considerations and incur substantial time overheads, resulting in reduced imperceptibility and robustness, along with user inconvenience. In this paper, we extend the previous criteria for image watermarking to the model level and propose NeRF Signature, a novel watermarking method for NeRF. We employ a Codebook-aided Signature Embedding (CSE) that does not alter the model structure, thereby maintaining imperceptibility and enhancing robustness at the model level. Furthermore, after optimization, any desired signatures can be embedded through the CSE, and no fine-tuning is required when NeRF owners want to use new binary signatures. Then, we introduce a joint pose-patch encryption watermarking strategy to hide signatures into patches rendered from a specific viewpoint for higher robustness. In addition, we explore a Complexity-Aware Key Selection (CAKS) scheme to embed signatures in high visual complexity patches to enhance imperceptibility. The experimental results demonstrate that our method outperforms other baseline methods in terms of imperceptibility and robustness.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 6","pages":"4652-4667"},"PeriodicalIF":0.0,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143598800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yujun Huang, Bin Chen, Naiqi Li, Baoyi An, Shu-Tao Xia, Yaowei Wang
{"title":"MB-RACS: Measurement-Bounds-based Rate-Adaptive Image Compressed Sensing Network.","authors":"Yujun Huang, Bin Chen, Naiqi Li, Baoyi An, Shu-Tao Xia, Yaowei Wang","doi":"10.1109/TPAMI.2025.3549986","DOIUrl":"10.1109/TPAMI.2025.3549986","url":null,"abstract":"<p><p>Conventional compressed sensing (CS) algorithms typically apply a uniform sampling rate to different image blocks. A more strategic approach could be to allocate the number of measurements adaptively, based on each image block's complexity. In this paper, we propose a Measurement-Bounds-based Rate-Adaptive Image Compressed Sensing Network (MB-RACS) framework, which aims to adaptively determine the sampling rate for each image block in accordance with traditional measurement bounds theory. Moreover, since in real-world scenarios statistical information about the original image cannot be directly obtained, we suggest a multi-stage rate-adaptive sampling strategy. This strategy sequentially adjusts the sampling ratio allocation based on the information gathered from previous samplings. We formulate the multi-stage rate-adaptive sampling as a convex optimization problem and address it using a combination of Newton's method and binary search techniques. Our experiments demonstrate that the proposed MB-RACS method surpasses current leading methods, with experimental evidence also underscoring the effectiveness of each module within our proposed framework.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143598606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhi Hou, Baosheng Yu, Chaoyue Wang, Yibing Zhan, Dacheng Tao
{"title":"Learning to Explore Sample Relationships.","authors":"Zhi Hou, Baosheng Yu, Chaoyue Wang, Yibing Zhan, Dacheng Tao","doi":"10.1109/TPAMI.2025.3549300","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3549300","url":null,"abstract":"<p><p>Despite the great success achieved, deep learning technologies usually suffer from data scarcity issues in real-world applications, where existing methods mainly explore sample relationships in a vanilla way from the perspectives of either the input or the loss function. In this paper, we propose a batch transformer module, BatchFormerV1, to equip deep neural networks themselves with the abilities to explore sample relationships in a learnable way. Basically, the proposed method enables data collaboration, e.g., head-class samples will also contribute to the learning of tail classes. Considering that exploring instance-level relationships has very limited impacts on dense prediction, we generalize and refer to the proposed module as BatchFormerV2, which further enables exploring sample relationships for pixel-/patch-level dense representations. In addition, to address the train-test inconsistency where a mini-batch of data samples are neither necessary nor desirable during inference, we also devise a two-stream training pipeline, i.e., a shared model is first jointly optimized with and without BatchFormerV2 which is then removed during testing. The proposed module is plug-and-play without requiring any extra inference cost. Lastly, we evaluate the proposed method on over ten popular datasets, including 1) different data scarcity settings such as long-tailed recognition, zero-shot learning, domain generalization, and contrastive learning; and 2) different visual recognition tasks ranging from image classification to object detection and panoptic segmentation. Code is available at https://zhihou7.github.io/BatchFormer.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143598605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correlated Topic Modeling for Short Texts in Spherical Embedding Spaces","authors":"Hafsa Ennajari;Nizar Bouguila;Jamal Bentahar","doi":"10.1109/TPAMI.2025.3550032","DOIUrl":"10.1109/TPAMI.2025.3550032","url":null,"abstract":"With the prevalence of short texts in various forms such as news headlines, tweets, and reviews, short text analysis has gained significant interest in recent times. However, modeling short texts remains a challenging task due to its sparse and noisy nature. In this paper, we propose a new Spherical Correlated Topic Model (SCTM), which takes into account the correlation between topics. Our model integrates word and knowledge graph embeddings to better capture the semantic relationships among short texts. We adopt the von Mises-Fisher distribution to model the high-dimensional word and entity embeddings on a hypersphere, enabling better preservation of the angular relationships between topic vectors. Moreover, knowledge graph embeddings are incorporated to further enrich the semantic meaning of short texts. Experimental results on several datasets demonstrate that our proposed SCTM model outperforms existing models in terms of both topic coherence and document classification. In addition, our model is capable of providing interpretable topics and revealing meaningful correlations among short texts.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 6","pages":"4567-4578"},"PeriodicalIF":0.0,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143598604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hessian-Aware Zeroth-Order Optimization","authors":"Haishan Ye;Zhichao Huang;Cong Fang;Chris Junchi Li;Tong Zhang","doi":"10.1109/TPAMI.2025.3548810","DOIUrl":"10.1109/TPAMI.2025.3548810","url":null,"abstract":"Zeroth-order optimization algorithms recently emerge as a popular research theme in optimization and machine learning, playing important roles in many deep-learning related tasks such as black-box adversarial attack, deep reinforcement learning, as well as hyper-parameter tuning. Mainstream zeroth-order optimization algorithms, however, concentrate on exploiting zeroth-order-estimated first-order gradient information of the objective landscape. In this paper, we propose a novel meta-algorithm called <italic>Hessian-Aware Zeroth-Order</i> (<monospace>ZOHA</monospace>) optimization algorithm, which utilizes several canonical variants of zeroth-order-estimated second-order Hessian information of the objective: power-method-based, and Gaussian-smoothing-based. We conclude theoretically that <monospace>ZOHA</monospace> enjoys an improved convergence rate compared with existing work without incorporating in zeroth-order optimization second-order Hessian information. Empirical studies on logistic regression as well as the black-box adversarial attack are provided to validate the effectiveness and improved success rates with reduced query complexity of the zeroth-order oracle.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 6","pages":"4869-4877"},"PeriodicalIF":0.0,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143575381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rui Zhang;Qi Meng;Rongchan Zhu;Yue Wang;Wenlei Shi;Shihua Zhang;Zhi-Ming Ma;Tie-Yan Liu
{"title":"Monte Carlo Neural PDE Solver for Learning PDEs via Probabilistic Representation","authors":"Rui Zhang;Qi Meng;Rongchan Zhu;Yue Wang;Wenlei Shi;Shihua Zhang;Zhi-Ming Ma;Tie-Yan Liu","doi":"10.1109/TPAMI.2025.3548673","DOIUrl":"10.1109/TPAMI.2025.3548673","url":null,"abstract":"In scenarios with limited available data, training the function-to-function neural PDE solver in an unsupervised manner is essential. However, the efficiency and accuracy of existing methods are constrained by the properties of numerical algorithms, such as finite difference and pseudo-spectral methods, integrated during the training stage. These methods necessitate careful spatiotemporal discretization to achieve reasonable accuracy, leading to significant computational challenges and inaccurate simulations, particularly in cases with substantial spatiotemporal variations. To address these limitations, we propose the Monte Carlo Neural PDE Solver (MCNP Solver) for training unsupervised neural solvers via the PDEs’ probabilistic representation, which regards macroscopic phenomena as ensembles of random particles. Compared to other unsupervised methods, MCNP Solver naturally inherits the advantages of the Monte Carlo method, which is robust against spatiotemporal variations and can tolerate coarse step size. In simulating the trajectories of particles, we employ Heun’s method for the convection process and calculate the expectation via the probability density function of neighbouring grid points during the diffusion process. These techniques enhance accuracy and circumvent the computational issues associated with Monte Carlo sampling. Our numerical experiments on convection-diffusion, Allen-Cahn, and Navier-Stokes equations demonstrate significant improvements in accuracy and efficiency compared to other unsupervised baselines.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 6","pages":"5059-5075"},"PeriodicalIF":0.0,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143575380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun-Jie Huang;Tianrui Liu;Zihan Chen;Xinwang Liu;Meng Wang;Pier Luigi Dragotti
{"title":"A Lightweight Deep Exclusion Unfolding Network for Single Image Reflection Removal","authors":"Jun-Jie Huang;Tianrui Liu;Zihan Chen;Xinwang Liu;Meng Wang;Pier Luigi Dragotti","doi":"10.1109/TPAMI.2025.3548148","DOIUrl":"10.1109/TPAMI.2025.3548148","url":null,"abstract":"Single Image Reflection Removal (SIRR) is a canonical blind source separation problem and refers to the issue of separating a reflection-contaminated image into a transmission and a reflection image. The core challenge lies in minimizing the commonalities among different sources. Existing deep learning approaches either neglect the significance of feature interactions or rely on heuristically designed architectures. In this paper, we propose a novel Deep Exclusion unfolding Network (DExNet), a lightweight, interpretable, and effective network architecture for SIRR. DExNet is principally constructed by unfolding and parameterizing a simple iterative Sparse and Auxiliary Feature Update (i-SAFU) algorithm, which is specifically designed to solve a new model-based SIRR optimization formulation incorporating a general exclusion prior. This general exclusion prior enables the unfolded SAFU module to inherently identify and penalize commonalities between the transmission and reflection features, ensuring more accurate separation. The principled design of DExNet not only enhances its interpretability but also significantly improves its performance. Comprehensive experiments on four benchmark datasets demonstrate that DExNet achieves state-of-the-art visual and quantitative results while utilizing only approximately 8% of the parameters required by leading methods.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 6","pages":"4957-4973"},"PeriodicalIF":0.0,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143569502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
JunYong Choi;SeokYeong Lee;Haesol Park;Seung-Won Jung;Ig-Jae Kim;Junghyun Cho
{"title":"MAIR++: Improving Multi-View Attention Inverse Rendering With Implicit Lighting Representation","authors":"JunYong Choi;SeokYeong Lee;Haesol Park;Seung-Won Jung;Ig-Jae Kim;Junghyun Cho","doi":"10.1109/TPAMI.2025.3548679","DOIUrl":"10.1109/TPAMI.2025.3548679","url":null,"abstract":"In this paper, we propose a scene-level inverse rendering framework that uses multi-view images to decompose the scene into geometry, SVBRDF, and 3D spatially-varying lighting. While multi-view images have been widely used for object-level inverse rendering, scene-level inverse rendering has primarily been studied using single-view images due to the lack of a dataset containing high dynamic range multi-view images with ground-truth geometry, material, and spatially-varying lighting. To improve the quality of scene-level inverse rendering, a novel framework called Multi-view Attention Inverse Rendering (MAIR) was recently introduced. MAIR performs scene-level multi-view inverse rendering by expanding the OpenRooms dataset, designing efficient pipelines to handle multi-view images, and splitting spatially-varying lighting. Although MAIR showed impressive results, its lighting representation is fixed to spherical Gaussians, which limits its ability to render images realistically. Consequently, MAIR cannot be directly used in applications such as material editing. Moreover, its multi-view aggregation networks have difficulties extracting rich features because they only focus on the mean and variance between multi-view features. In this paper, we propose its extended version, called MAIR++. MAIR++ addresses the aforementioned limitations by introducing an implicit lighting representation that accurately captures the lighting conditions of an image while facilitating realistic rendering. Furthermore, we design a directional attention-based multi-view aggregation network to infer more intricate relationships between views. Experimental results show that MAIR++ not only outperforms MAIR and single-view-based methods but also demonstrates robust performance on unseen real-world scenes.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 6","pages":"5076-5093"},"PeriodicalIF":0.0,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143569505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingchun Zhou;Zongxin He;Dehuan Zhang;Siyuan Liu;Xianping Fu;Xuelong Li
{"title":"Spatial Residual for Underwater Object Detection","authors":"Jingchun Zhou;Zongxin He;Dehuan Zhang;Siyuan Liu;Xianping Fu;Xuelong Li","doi":"10.1109/TPAMI.2025.3548652","DOIUrl":"10.1109/TPAMI.2025.3548652","url":null,"abstract":"Feature drift is caused by the dynamic coupling of target features and degradation factors, which reduce underwater detector performance. We redefine feature drift as the instability of target features within boundary constraints while solving partial differential equations (PDEs). From this insight, we propose the Spatial Residual (SR) block, which uses SkipCut to establish effective constraints across the network width for solving PDEs and optimizes the solution space. It is implemented as a general-purpose backbone with 5 Spatial Residuals (BSR5) for complex feature scenarios. Specifically, BSR5 extracts discrete channel slices through SkipCut, where each sliced feature is parsed within the appropriate data capacity. In gradient backpropagation, SkipCut functions as a ShortCut, optimizing information flow and gradient allocation to enhance performance and accelerate training. Experiments on the RUOD dataset show that BSR5-integrated DETRs and YOLOs achieve state-of-the-art results for conventional and end-to-end detectors. Specifically, our BSR5-DETR improves 1.3% and 2.7% AP than RT-DETR with ResNet-101, while reducing parameters by 41.6% and 6.6%, respectively. Further validation highlights BSR5's strong convergence and robustness, especially in training from scratch scenarios, making it well suited for data-scarce, resource-constrained, and real-time tasks.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 6","pages":"4996-5013"},"PeriodicalIF":0.0,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143569506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}