Yakun Ju, Boxin Shi, Bihan Wen, Kin-Man Lam, Xudong Jiang, Alex C Kot
{"title":"Revisiting One-stage Deep Uncalibrated Photometric Stereo via Fourier Embedding.","authors":"Yakun Ju, Boxin Shi, Bihan Wen, Kin-Man Lam, Xudong Jiang, Alex C Kot","doi":"10.1109/TPAMI.2025.3557245","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3557245","url":null,"abstract":"<p><p>This paper introduces a one-stage deep uncalibrated photometric stereo (UPS) network, namely Fourier Uncalibrated Photometric Stereo Network (FUPS-Net), for non-Lambertian objects under unknown light directions. It departs from traditional two-stage methods that first explicitly learn lighting information and then estimate surface normals. Two-stage methods were deployed because the interplay of lighting with shading cues presents challenges for directly estimating surface normals without explicit lighting information. However, these two-stage networks are disjointed and separately trained so that the error in explicit light calibration will propagate to the second stage and cannot be eliminated. In contrast, the proposed FUPS-Net utilizes an embedded Fourier transform network to implicitly learn lighting features by decomposing inputs, rather than employing a disjointed light estimation network. Our approach is motivated from observations in the Fourier domain of photometric stereo images: lighting information is mainly encoded in amplitudes, while geometry information is mainly associated with phases. Leveraging this property, our method \"decomposes\" geometry and lighting in the Fourier domain as guidance, via the proposed Fourier Embedding Extraction (FEE) block and Fourier Embedding Aggregation (FEA) block, which generate lighting and geometry features for the FUPS-Net to implicitly resolve the geometry-lighting ambiguity. Furthermore, we propose a Frequency-Spatial Weighted (FSW) block that assigns weights to combine features extracted from the frequency domain and those from the spatial domain for enhancing surface reconstructions. FUPS-Net overcomes the limitations of two-stage UPS methods, offering better training stability, a concise end-to-end structure, and avoiding accumulated errors in disjointed networks. Experimental results on synthetic and real datasets demonstrate the superior performance of our approach, and its simpler training setup, potentially paving the way for a new strategy in deep learning-based UPS methods.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143775229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling the Label Distributions for Weakly-Supervised Semantic Segmentation.","authors":"Linshan Wu, Zhun Zhong, Jiayi Ma, Yunchao Wei, Hao Chen, Leyuan Fang, Shutao Li","doi":"10.1109/TPAMI.2025.3557047","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3557047","url":null,"abstract":"<p><p>Weakly-Supervised Semantic Segmentation (WSSS) aims to train segmentation models by weak labels, which is receiving significant attention due to its low annotation cost. Existing approaches focus on generating pseudo labels for supervision while largely ignoring to leverage the inherent semantic correlation among different pseudo labels. We observe that pseudo-labeled pixels that are close to each other in the feature space are more likely to share the same class, and those closer to the distribution centers tend to have higher confidence. Motivated by this, we propose to model the underlying label distributions and employ cross-label constraints to generate more accurate pseudo labels. In this paper, we develop a unified WSSS framework named Adaptive Gaussian Mixtures Model, which leverages a GMM to model the label distributions. Specifically, we calculate the feature distribution centers of pseudo-labeled pixels and build the GMM by measuring the distance between the centers and each pseudo-labeled pixel. Then, we introduce an Online Expectation-Maximization (OEM) algorithm and a novel maximization loss to optimize the GMM adaptively, aiming to learn more discriminative decision boundaries between different class- wise Gaussian mixtures. Based on the label distributions, we leverage the GMM to generate high-quality pseudo labels for more reliable supervision. Our framework is capable of solving different forms of weak labels: image-level labels, points, scribbles, blocks, and bounding-boxes. Extensive experiments on PASCAL, COCO, Cityscapes, and ADE20 K datasets demonstrate that our framework can effectively provide more reliable supervision and outperform the state-of-the-art methods under all settings. Code will be available at https://github.com/Luffy03/AGMM-SASS.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143775191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interpreting Low-level Vision Models with Causal Effect Maps.","authors":"Jinfan Hu, Jinjin Gu, Shiyao Yu, Fanghua Yu, Zheyuan Li, Zhiyuan You, Chaochao Lu, Chao Dong","doi":"10.1109/TPAMI.2025.3557149","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3557149","url":null,"abstract":"<p><p>Deep neural networks have significantly improved the performance of low-level vision tasks but also increased the difficulty of interpretability. A deep understanding of deep models is beneficial for both network design and practical reliability. To take up this challenge, we introduce causality theory to interpret low-level vision models and propose a model-/task-agnostic method called Causal Effect Map (CEM). With CEM, we can visualize and quantify the input-output relationships on either positive or negative effects. After analyzing various low-level vision tasks with CEM, we have reached several interesting insights, such as: (1) Using more information of input images (e.g., larger receptive field) does NOT always yield positive outcomes. (2) Attempting to incorporate mechanisms with a global receptive field (e.g., channel attention) into image denoising may prove futile. (3) Integrating multiple tasks to train a general model could encourage the network to prioritize local information over global context. Based on the causal effect theory, the proposed diagnostic tool can refresh our common knowledge and bring a deeper understanding of low-level vision models. Codes are available at https://github.com/J-FHu/CEM.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143775185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunhao Zou, Ying Fu, Yulun Zhang, Tao Zhang, Chenggang Yan, Radu Timofte
{"title":"Calibration-Free Raw Image Denoising via Fine-Grained Noise Estimation.","authors":"Yunhao Zou, Ying Fu, Yulun Zhang, Tao Zhang, Chenggang Yan, Radu Timofte","doi":"10.1109/TPAMI.2025.3550264","DOIUrl":"10.1109/TPAMI.2025.3550264","url":null,"abstract":"<p><p>Image denoising has progressed significantly due to the development of effective deep denoisers. To improve the performance in real-world scenarios, recent trends prefer to formulate superior noise models to generate realistic training data, or estimate noise levels to steer non-blind denoisers. In this paper, we bridge both strategies by presenting an innovative noise estimation and realistic noise synthesis pipeline. Specifically, we integrates a fine-grained statistical noise model and contrastive learning strategy, with a unique data augmentation to enhance learning ability. Then, we use this model to estimate noise parameters on evaluation dataset, which are subsequently used to craft camera-specific noise distribution and synthesize realistic noise. One distinguishing feature of our methodology is its adaptability: our pre-trained model can directly estimate unknown cameras, making it possible to unfamiliar sensor noise modeling using only testing images, without calibration frames or paired training data. Another highlight is our attempt in estimating parameters for fine-grained noise models, which extends the applicability to even more challenging low-light conditions. Through empirical testing, our calibration-free pipeline demonstrates effectiveness in both normal and low-light scenarios, further solidifying its utility in real-world noise synthesis and denoising tasks.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143702482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Addressing Information Asymmetry: Deep Temporal Causality Discovery for Mixed Time Series.","authors":"Jiawei Chen, Chunhui Zhao","doi":"10.1109/TPAMI.2025.3553957","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3553957","url":null,"abstract":"<p><p>While existing causal discovery methods mostly focus on continuous time series, causal discovery for mixed time series encompassing both continuous variables (CVs) and discrete variables (DVs) is a fundamental yet underexplored problem. Together with nonlinearity and high dimensionality, mixed time series pose significant challenges for causal discovery. This study addresses the aforementioned challenges based on the following recognitions: (1) DVs may originate from latent continuous variables (LCVs) and undergo discretization processes due to measurement limitations, storage requirements, and other reasons. (2) LCVs contain fine-grained information and interact with CVs. By leveraging these interactions, the intrinsic continuity of DVs can be recovered. Thereupon, we propose a generic deep mixed time series temporal causal discovery framework. Our key idea is to adaptively recover LCVs from DVs with the guidance of CVs and perform causal discovery in a unified continuous-valued space. Technically, a new contextual adaptive Gaussian kernel embedding technique is developed for latent continuity recovery by adaptively aggregating temporal contextual information of DVs. Accordingly, two interdependent model training stages are devised for learning the latent continuity recovery with self-supervision and causal structure learning with sparsity-induced optimization. Experimentally, extensive empirical evaluations and in-depth investigations validate the superior performance of our framework. Our code and data are available at https://github.com/chunhuiz/MiTCD.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143702469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tong Guo, Yi Mei, Mengjie Zhang, Haoran Zhao, Kaiquan Cai, Wenbo Du
{"title":"Learning-aided Neighborhood Search for Vehicle Routing Problems.","authors":"Tong Guo, Yi Mei, Mengjie Zhang, Haoran Zhao, Kaiquan Cai, Wenbo Du","doi":"10.1109/TPAMI.2025.3554669","DOIUrl":"10.1109/TPAMI.2025.3554669","url":null,"abstract":"<p><p>The Vehicle Routing Problem (VRP) is a classic optimization problem with diverse real-world applications. The neighborhood search has emerged as an effective approach, yielding high-quality solutions across different VRPs. However, most existing studies exhaustively explore all considered neighborhoods with a pre-fixed order, leading to an inefficient search process. To address this issue, this paper proposes a Learning-aided Neighborhood Search algorithm (LaNS) that employs a cutting-edge multi-agent reinforcement learning-driven adaptive operator/neighborhood selection mechanism to achieve efficient routing for VRP. Within this framework, two agents serve as high-level instructors, collaboratively guiding the search direction by selecting perturbation/improvement operators from a pool of low-level heuristics. Furthermore, to equip the agents with comprehensive information for learning guidance knowledge, we have developed a new informative state representation. This representation transforms the spatial route structures into an image-like tensor, allowing us to extract spatial features using a convolutional neural network. Comprehensive evaluations on diverse VRP benchmarks, including the capacitated VRP (CVRP), multi-depot VRP (MDVRP) and cumulative multi-depot VRP with energy constraints, demonstrate LaNS's superiority over the state-of-the-art neighborhood search methods as well as the existing learning-guided neighborhood search algorithms.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143702483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Man-Sheng Chen, Pei-Yuan Lai, De-Zhang Liao, Chang-Dong Wang, Jian-Huang Lai
{"title":"Graph Prompt Clustering.","authors":"Man-Sheng Chen, Pei-Yuan Lai, De-Zhang Liao, Chang-Dong Wang, Jian-Huang Lai","doi":"10.1109/TPAMI.2025.3553129","DOIUrl":"10.1109/TPAMI.2025.3553129","url":null,"abstract":"<p><p>Due to the wide existence of unlabeled graph-structured data (e.g. molecular structures), the graph-level clustering has recently attracted increasing attention, whose goal is to divide the input graphs into several disjoint groups. However, the existing methods habitually focus on learning the graphs embeddings with different graph reguralizations, and seldom refer to the obvious differences in data distributions of distinct graph-level datasets. How to characteristically consider multiple graph-level datasets in a general well-designed model without prior knowledge is still challenging. In view of this, we propose a novel Graph Prompt Clustering (GPC) method. Within this model, there are two main modules, i.e., graph model pretraining as well as prompt and finetuning. In the graph model pretraining module, the graph model is pretrained by a selected source graph-level dataset with mutual information maximization and self-supervised clustering regularization. In the prompt and finetuning module, the network parameters of the pretrained graph model are frozen, and a groups of learnable prompt vectors assigned to each graph-level representation are trained for adapting different target graph-level datasets with various data distributions. Experimental results across six benchmark datasets demonstrate the impressive generalization capability and effectiveness of GPC compared with the state-of-the-art methods.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143672019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tu Zheng, Yifei Huang, Yang Liu, Binbin Lin, Zheng Yang, Deng Cai, Xiaofei He
{"title":"CLRNetV2: A Faster and Stronger Lane Detector.","authors":"Tu Zheng, Yifei Huang, Yang Liu, Binbin Lin, Zheng Yang, Deng Cai, Xiaofei He","doi":"10.1109/TPAMI.2025.3551935","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3551935","url":null,"abstract":"<p><p>Lane is critical in the vision navigation system of intelligent vehicles. Naturally, the lane is a traffic sign with high-level semantics, whereas it owns the specific local pattern which needs detailed low-level features to localize accurately. Using different feature levels is of great importance for accurate lane detection, but it is still under-explored. On the other hand, current lane detection methods still struggle to detect complex dense lanes, such as Y-shape or fork-shape. In this work, we present Cross Layer Refinement Network aiming at fully utilizing both high-level and low-level features in lane detection. In particular, it first detects lanes with high-level semantic features and then performs refinement based on low-level features. In this way, we can exploit more contextual information to detect lanes while leveraging local-detailed features to improve localization accuracy. We present Fast-ROIGather to gather global context, which further enhances the representation of lane features. To detect dense lanes accurately, we propose Correlation Discrimination Module (CDM) to discriminate the correlation of dense lanes, enabling nearly cost-free high-quality dense lane prediction. In addition to our novel network design, we introduce LineIoU loss which regresses lanes as a whole unit to improve localization accuracy. Experiments demonstrate our approach significantly outperforms the state-of-the-art lane detection methods.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143660091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Systematic Bias of Machine Learning Regression Models and Correction.","authors":"Hwiyoung Lee, Shuo Chen","doi":"10.1109/TPAMI.2025.3552368","DOIUrl":"10.1109/TPAMI.2025.3552368","url":null,"abstract":"<p><p>Machine learning models for continuous outcomes often yield systematically biased predictions, particularly for values that largely deviate from the mean. Specifically, predictions for large-valued outcomes tend to be negatively biased (underestimating actual values), while those for small-valued outcomes are positively biased (overestimating actual values). We refer to this linear central tendency warped bias as the \"systematic bias of machine learning regression\". In this paper, we first demonstrate that this systematic prediction bias persists across various machine learning regression models, and then delve into its theoretical underpinnings. To address this issue, we propose a general constrained optimization approach designed to correct this bias and develop computationally efficient implementation algorithms. Simulation results indicate that our correction method effectively eliminates the bias from the predicted outcomes. We apply the proposed approach to the prediction of brain age using neuroimaging data. In comparison to competing machine learning regression models, our method effectively addresses the longstanding issue of \"systematic bias of machine learning regression\" in neuroimaging-based brain age calculation, yielding unbiased predictions of brain age.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143660149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hulk: A Universal Knowledge Translator for Human-Centric Tasks.","authors":"Yizhou Wang, Yixuan Wu, Weizhen He, Xun Guo, Feng Zhu, Lei Bai, Rui Zhao, Jian Wu, Tong He, Wanli Ouyang, Shixiang Tang","doi":"10.1109/TPAMI.2025.3552604","DOIUrl":"10.1109/TPAMI.2025.3552604","url":null,"abstract":"<p><p>Human-centric perception tasks, e.g., pedestrian detection, skeleton-based action recognition, and pose estimation, have wide industrial applications, such as metaverse and sports analysis. There is a recent surge to develop human-centric foundation models that can benefit a broad range of human-centric perception tasks. While many human-centric foundation models have achieved success, they did not explore 3D and vision-language tasks for human-centric and required task-specific finetuning. These limitations restrict their application to more downstream tasks and situations. To tackle these problems, we present Hulk, the first multimodal human-centric generalist model, capable of addressing 2D vision, 3D vision, skeleton-based, and vision-language tasks without task-specific finetuning. The key to achieving this is condensing various task-specific heads into two general heads, one for discrete representations, e.g., languages, and the other for continuous representations, e.g., location coordinates. The outputs of two heads can be further stacked into four distinct input and output modalities. This uniform representation enables Hulk to treat diverse human-centric tasks as modality translation, integrating knowledge across a wide range of tasks. Comprehensive evaluations of Hulk on 12 benchmarks covering 8 human-centric tasks demonstrate the superiority of our proposed method, achieving state-of-the-art performance in 11 benchmarks. The code will be available on https://github.com/OpenGVLab/Hulk.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143660141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}