{"title":"Multi-Channel Equilibrium Graph Neural Network for Multi-View Semi-Supervised Learning","authors":"Shiping Wang;Yueyang Pi;Yang Huang;Fuhai Chen;Le Zhang","doi":"10.1109/TPAMI.2025.3587216","DOIUrl":"10.1109/TPAMI.2025.3587216","url":null,"abstract":"In practical applications, the difficulty of multi-view data annotation poses a challenge for multi-view semi-supervised learning. Although some graph-based approaches have been proposed for this task, they often struggle with capturing long-range information and memory bottlenecks, and usually encounter over-smoothing. To address these issues, this paper proposes an implicit model, named multi-channel Equilibrium Graph Neural Network (MEGNN). Through an equilibrium point iterative process, the proposed MEGNN naturally captures long-range information and effectively reduces the consumption of memory compared with explicit models. Furthermore, the proposed method deals with the issue of over-smoothing in deep graph convolutional networks by residual connection and shrinkage factor. We analyze the effect of the shrinkage factor on the information capturing capability of the model, and demonstrate that the proposed method does not encounter over-smoothing. Comprehensive experimental results demonstrate that the proposed method outperforms the state-of-the-art methods.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 10","pages":"9375-9382"},"PeriodicalIF":18.6,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144678165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pro-NeXt: An All-in-One Unified Model for General Fine-Grained Visual Recognition","authors":"Junde Wu;Jiayuan Zhu;Min Xu;Yueming Jin","doi":"10.1109/TPAMI.2025.3584902","DOIUrl":"10.1109/TPAMI.2025.3584902","url":null,"abstract":"Unlike general visual classification (CLS) tasks, certain CLS problems are significantly more challenging as they involve recognizing professionally categorized or highly specialized images. Fine-Grained Visual Classification (FGVC) has emerged as a broad solution to address this complexity. However, most existing methods have been predominantly evaluated on a limited set of homogeneous benchmarks, such as bird species or vehicle brands. Moreover, these approaches often train separate models for each specific task, which restricts their generalizability. This paper proposes a scalable and explainable foundational model designed to tackle a wide range of FGVC tasks from a unified and generalizable perspective. We introduce a novel architecture named Pro-NeXt and reveal that Pro-NeXt exhibits substantial generalizability across diverse professional fields such as fashion, medicine, and art areas, previously considered disparate. Our basic-sized Pro-NeXt-B surpasses all preceding task-specific models across 12 distinct datasets within 5 diverse domains. Furthermore, we find its good scaling property that scaling up Pro-NeXt in depth and width with increasing GFlops can consistently enhance its accuracy. Beyond scalability and adaptability, the intermediate features of Pro-NeXt achieve reliable object detection and segmentation performance without extra training, highlighting its solid explainability. We will release the code to promote further research in this area.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 10","pages":"9187-9200"},"PeriodicalIF":18.6,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144645756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anshul Thakur;Yichen Huang;Soheila Molaei;Yujiang Wang;David A. Clifton
{"title":"Efficient Task Grouping Through Sample-Wise Optimisation Landscape Analysis","authors":"Anshul Thakur;Yichen Huang;Soheila Molaei;Yujiang Wang;David A. Clifton","doi":"10.1109/TPAMI.2025.3588685","DOIUrl":"10.1109/TPAMI.2025.3588685","url":null,"abstract":"Shared training approaches, such as multi-task learning (MTL) and gradient-based meta-learning, are widely used in various machine learning applications, but they often suffer from negative transfer, leading to performance degradation in specific tasks. While several optimisation techniques have been developed to mitigate this issue for pre-selected task cohorts, identifying optimal task combinations for joint learning—known as task grouping—remains underexplored and computationally challenging due to the exponential growth in task combinations and the need for extensive training and evaluation cycles. This paper introduces an efficient task grouping framework designed to reduce these overwhelming computational demands of the existing methods. The proposed framework infers pairwise task similarities through a sample-wise optimisation landscape analysis, eliminating the need for the shared model training required to infer task similarities in existing methods. With task similarities acquired, a graph-based clustering algorithm is employed to pinpoint near-optimal task groups, providing an approximate yet efficient and effective solution to the originally NP-hard problem. Empirical assessments conducted on 9 different datasets highlight the effectiveness of the proposed framework, revealing a five-fold speed enhancement compared to previous state-of-the-art methods. Moreover, the framework consistently demonstrates comparable performance, confirming its remarkable efficiency and effectiveness in task grouping.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 10","pages":"9266-9279"},"PeriodicalIF":18.6,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11078907","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144630035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Event-Based Stereo Depth Estimation: A Survey","authors":"Suman Ghosh;Guillermo Gallego","doi":"10.1109/TPAMI.2025.3586559","DOIUrl":"10.1109/TPAMI.2025.3586559","url":null,"abstract":"Stereopsis has widespread appeal in computer vision and robotics as it is the predominant way by which we perceive depth to navigate our 3D world. Event cameras are novel bio-inspired sensors that detect per-pixel brightness changes asynchronously, with very high temporal resolution and high dynamic range, enabling machine perception in high-speed motion and broad illumination conditions. The high temporal precision also benefits stereo matching, making disparity (depth) estimation a popular research area for event cameras ever since their inception. Over the last 30 years, the field has evolved rapidly, from low-latency, low-power circuit design to current deep learning (DL) approaches driven by the computer vision community. The bibliography is vast and difficult to navigate for non-experts due its highly interdisciplinary nature. Past surveys have addressed distinct aspects of this topic, in the context of applications, or focusing only on a specific class of techniques, but have overlooked stereo datasets. This survey provides a comprehensive overview, covering both instantaneous stereo and long-term methods suitable for simultaneous localization and mapping (SLAM), along with theoretical and empirical comparisons. It is the first to extensively review DL methods as well as stereo datasets, even providing practical suggestions for creating new benchmarks to advance the field. The main advantages and challenges faced by event-based stereo depth estimation are also discussed. Despite significant progress, challenges remain in achieving optimal performance in not only accuracy but also efficiency, a cornerstone of event-based computing. We identify several gaps and propose future research directions. We hope this survey inspires future research in depth estimation with event cameras and related topics, by serving as an accessible entry point for newcomers, as well as a practical guide for seasoned researchers in the community.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 10","pages":"9130-9149"},"PeriodicalIF":18.6,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11078760","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144611161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Rain Location Prior for Nighttime Deraining and Beyond","authors":"Fan Zhang;Shaodi You;Yu Li;Ying Fu","doi":"10.1109/TPAMI.2025.3586361","DOIUrl":"10.1109/TPAMI.2025.3586361","url":null,"abstract":"Most deraining methods work on day scenes while leaving nighttime deraining underexplored, where darkness and non-uniform illuminations pose additional challenges. Consequently, night rain has a quite different appearance varying by location and cannot be effectively handled. To accommodate this issue, we propose a Rain Location Prior (RLP) by implicitly learning it from rainy images to reflect rain location information and boost the performance of deraining models by prior injection. Then, we introduce a Rain Prior Injection Module (RPIM) with a multi-scale scheme to modulate it by attention and emphasize the features of rain streak areas for better injection efficiency. Finally, to alleviate the data scarcity issue and facilitate the research on nighttime deraining, we propose the GTAV-NightRain dataset by considering the interaction between rain streaks and non-uniform illuminations, and provide detailed instructions on data collection pipeline which is highly replicable and flexible to integrate challenging factors of rainy night in the future. Our method outperforms state-of-the-art backbone by 1.3 dB in PSNR and generalizes better on real data such as heavy rain and the presence of glow and glaring lights. Ablation studies are conducted to validate the effectiveness of each component and we visualize RLP to show good interpretability. Moreover, we apply our method to daytime deraining and desnow to show good generalizability on other location-dependent degradations. Our method is a step forward in nighttime deraining and the GTAV-NightRain dataset may become a good complement to previous datasets.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 10","pages":"9169-9186"},"PeriodicalIF":18.6,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144594377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shaofeng Zhang;Qiang Zhou;Zhibin Wang;Hao Li;Junchi Yan
{"title":"EasyOutPainter: One Step Image Outpainting With Both Continuous Multiple and Resolution","authors":"Shaofeng Zhang;Qiang Zhou;Zhibin Wang;Hao Li;Junchi Yan","doi":"10.1109/TPAMI.2025.3586824","DOIUrl":"10.1109/TPAMI.2025.3586824","url":null,"abstract":"Image outpainting aims to generate the content of an input sub-image outside its boundaries, which remains open for existing generative models. This paper explores image outpainting in three directions that have not been achieved in literature to our knowledge: outpainting 1) with continuous multiples (in contrast to the discrete ones by existing methods); 2) with arbitrary resolutions; and 3) in a single step (for any multiples and resolutions). The arbitrary multiple outpainting is achieved by utilizing randomly cropped views from the same image during training to capture arbitrary relative positional information. Specifically, by feeding one view and relative positional embeddings as queries, we can reconstruct another view. At inference, we generate images with arbitrary expansion multiples by inputting an anchor image and its corresponding positional embeddings. The continuous-resolution outpainting is achieved by introducing the multi-scale training strategy into generative models. Specifically, by disentangling the image resolution and the number of patches, it can generate images with arbitrary resolutions without post-processing. Meanwhile, we propose a query-based contrastive objective to make our method not rely on a pre-trained backbone network which is otherwise often required in peer methods. The comprehensive experimental results on public benchmarks show its superior performance over state-of-the-art approaches.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 10","pages":"9217-9231"},"PeriodicalIF":18.6,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144578805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gradient Projection for Continual Parameter-Efficient Tuning","authors":"Jingyang Qiao;Zhizhong Zhang;Xin Tan;Yanyun Qu;Wensheng Zhang;Zhi Han;Yuan Xie","doi":"10.1109/TPAMI.2025.3587032","DOIUrl":"10.1109/TPAMI.2025.3587032","url":null,"abstract":"Parameter-efficient tunings (PETs) have demonstrated impressive performance and promising perspectives in training large models, while they are still confronted with a common problem: the trade-off between learning new content and protecting old knowledge, leading to zero-shot generalization collapse, and cross-modal hallucination. In this paper, we reformulate Adapter, LoRA, Prefix-tuning, and Prompt-tuning from the perspective of gradient projection, and first propose a unified framework called <underline><b>P</b></u>arameter <underline><b>E</b></u>fficient <underline><b>G</b></u>radient <underline><b>P</b></u>rojection (PEGP). We introduce orthogonal gradient projection into different PET paradigms and theoretically demonstrate that the orthogonal condition for the gradient can effectively resist forgetting even for large-scale models. It therefore modifies the gradient towards the direction that has less impact on the old feature space, with less extra memory space and training time. We extensively evaluate our method with different backbones, including ViT and CLIP, on diverse datasets, and experiments comprehensively demonstrate its efficiency in reducing forgetting in class, online class, domain, task, and multi-modality continual settings.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 10","pages":"9316-9329"},"PeriodicalIF":18.6,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144578806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Long Ma;Tengyu Ma;Chengpei Xu;Jinyuan Liu;Xin Fan;Zhongxuan Luo;Risheng Liu
{"title":"Learning With Self-Calibrator for Fast and Robust Low-Light Image Enhancement","authors":"Long Ma;Tengyu Ma;Chengpei Xu;Jinyuan Liu;Xin Fan;Zhongxuan Luo;Risheng Liu","doi":"10.1109/TPAMI.2025.3586712","DOIUrl":"10.1109/TPAMI.2025.3586712","url":null,"abstract":"Convolutional Neural Networks (CNNs) have shown significant success in the low-light image enhancement task. However, most of existing works encounter challenges in balancing quality and efficiency simultaneously. This limitation hinders practical applicability in real-world scenarios and downstream vision tasks. To overcome these obstacles, we propose a Self-Calibrated Illumination (SCI) learning scheme, introducing a new perspective to boost the model’s capability. Based on a weight-sharing illumination estimation process, we construct an embedded self-calibrator to accelerate stage-level convergence, yielding gains that utilize only a single basic block for inference, which drastically diminishes computation cost. Additionally, by introducing the additivity condition on the basic block, we acquire a reinforced version dubbed SCI++, which disentangles the relationship between the self-calibrator and illumination estimator, providing a more interpretable and effective learning paradigm with faster convergence and better stability. We assess the proposed enhancers on standard benchmarks and in-the-wild datasets, confirming that they can restore clean images from diverse scenes with higher quality and efficiency. The verification on different levels of low-light vision tasks shows our applicability against other methods.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 10","pages":"9095-9112"},"PeriodicalIF":18.6,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144577955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Event-Based Photometric Bundle Adjustment","authors":"Shuang Guo;Guillermo Gallego","doi":"10.1109/TPAMI.2025.3586497","DOIUrl":"10.1109/TPAMI.2025.3586497","url":null,"abstract":"We tackle the problem of bundle adjustment (i.e., simultaneous refinement of camera poses and scene map) for a purely rotating event camera. Starting from first principles, we formulate the problem as a classical non-linear least squares optimization. The photometric error is defined using the event generation model directly in the camera rotations and the semi-dense scene brightness that triggers the events. We leverage the sparsity of event data to design a tractable Levenberg-Marquardt solver that handles the very large number of variables involved. To the best of our knowledge, our method, which we call Event-based Photometric Bundle Adjustment (EPBA), is the first event-only photometric bundle adjustment method that works on the brightness map directly and exploits the space-time characteristics of event data, without having to convert events into image-like representations. Comprehensive experiments on both synthetic and real-world datasets demonstrate EPBA’s effectiveness in decreasing the photometric error (by up to 90%), yielding results of unparalleled quality. The refined maps reveal details that were hidden using prior state-of-the-art rotation-only estimation methods. The experiments on modern high-resolution event cameras show the applicability of EPBA to panoramic imaging in various scenarios (without map initialization, at multiple resolutions, and in combination with other methods, such as IMU dead reckoning or previous event-based rotation estimation methods). We make the source code publicly available.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 10","pages":"9280-9297"},"PeriodicalIF":18.6,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11072301","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144577916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kaixin Xu;Zhe Wang;Runtao Huang;Xue Geng;Jie Lin;Xulei Yang;Min Wu;Xiaoli Li;Weisi Lin
{"title":"Efficient Distortion-Minimized Layerwise Pruning","authors":"Kaixin Xu;Zhe Wang;Runtao Huang;Xue Geng;Jie Lin;Xulei Yang;Min Wu;Xiaoli Li;Weisi Lin","doi":"10.1109/TPAMI.2025.3586418","DOIUrl":"10.1109/TPAMI.2025.3586418","url":null,"abstract":"In this paper, we propose a post-training pruning framework that jointly optimizes layerwise pruning to minimize model output distortion. Through theoretical and empirical analysis, we discover an important additivity property of output distortion from pruning weights/channels in DNNs. Leveraging this property, we reformulate pruning optimization as a combinatorial problem and solve it with dynamic programming, achieving linear time complexity and making the algorithm very fast on CPUs. Furthermore, we optimize additivity-derived distortions using Hessian-based Taylor approximation to enhance pruning efficiency, accompanied by fine-grained complexity reduction techniques. Our method is evaluated on various DNN architectures, including CNNs, ViTs, and object detectors, and on vision tasks such as image classification on CIFAR-10 and ImageNet, and 3D object detection and various datasets. We achieve SoTA with significant FLOPs reductions without accuracy loss. Specifically, on CIFAR-10, we achieve up to <inline-formula><tex-math>$27.9times$</tex-math></inline-formula>, <inline-formula><tex-math>$29.2times$</tex-math></inline-formula>, and <inline-formula><tex-math>$14.9times$</tex-math></inline-formula> FLOPs reductions on ResNet-32, VGG-16, and DenseNet-121, respectively. On ImageNet, we observe no accuracy loss with <inline-formula><tex-math>$1.69times$</tex-math></inline-formula> and <inline-formula><tex-math>$2times$</tex-math></inline-formula> FLOPs reductions on ResNet-50 and DeiT-Base, respectively. For 3D object detection, we achieve <inline-formula><tex-math>$mathbf {3.89}times, mathbf {3.72}times$</tex-math></inline-formula> FLOPs reductions on CenterPoint and PVRCNN models. These results demonstrate the effectiveness and practicality of our approach for improving model performance through layer-adaptive weight pruning.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 10","pages":"9298-9315"},"PeriodicalIF":18.6,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144577915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}