{"title":"3DCOMPAT<sup>++</sup>: An Improved Large-scale 3D Vision Dataset for Compositional Recognition.","authors":"Habib Slim, Xiang Li, Yuchen Li, Mahmoud Ahmed, Mohamed Ayman, Ujjwal Upadhyay, Ahmed Abdelreheem, Arpit Prajapati, Suhail Pothigara, Peter Wonka, Mohamed Elhoseiny","doi":"10.1109/TPAMI.2025.3597476","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3597476","url":null,"abstract":"<p><p>In this work, we present 3DCOMPAT<sup>++</sup>, a multimodal 2D/3D dataset with 160 million rendered views of more than 10 million stylized 3D shapes carefully annotated at the partinstance level, alongside matching RGB point clouds, 3D textured meshes, depth maps, and segmentation masks. 3DCOMPAT ++ covers 42 shape categories, 275 fine-grained part categories, and 293 fine-grained material classes that can be compositionally applied to parts of 3D objects. We render a subset of one million stylized shapes from four equally spaced views as well as four randomized views, leading to a total of 160 million renderings. Parts are segmented at the instance level, with coarse-grained and fine-grained semantic levels. We introduce a new task, called Grounded CoMPaT Recognition (GCR), to collectively recognize and ground compositions of materials on parts of 3D objects. Additionally, we report the outcomes of a data challenge organized at the CVPR conference, showcasing the winning method's utilization of a modified PointNet<sup>++</sup> model trained on 6D inputs, and exploring alternative techniques for GCR enhancement. We hope our work will help ease future research on compositional 3D Vision. The dataset and code have been made publicly available at https://3dcompat-dataset.org/v2/. 3D vision, dataset, 3D modeling, multimodal learning, compositional learning.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144823489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VLPose: Bridging the Domain Gap in Pose Estimation With Language-Vision Tuning","authors":"Jingyao Li;Pengguang Chen;Xuan Ju;Shu Liu;Hong Xu;Jiaya Jia","doi":"10.1109/TPAMI.2025.3594097","DOIUrl":"10.1109/TPAMI.2025.3594097","url":null,"abstract":"Thanks to advances in deep learning techniques, Human Pose Estimation (HPE) has achieved significant progress in natural scenarios. However, these models perform poorly in artificial scenarios such as painting and sculpture due to the domain gap, constraining the development of virtual reality and augmented reality. With the growth of model size, retraining the whole model on both natural and artificial data is computationally expensive and inefficient. Our research aims to bridge the domain gap between natural and artificial scenarios with efficient tuning strategies. Leveraging the potential of language models, we enhance the adaptability of traditional pose estimation models across diverse scenarios with a novel framework called VLPose. VLPose leverages the synergy between language and vision to extend the generalization and robustness of pose estimation models beyond the traditional domains. Our approach has demonstrated improvements of 2.26% and 3.74% on HumanArt and MSCOCO, respectively, compared to state-of-the-art tuning strategies.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 11","pages":"10836-10847"},"PeriodicalIF":18.6,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144823508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peng Wang, Lingzhe Zhao, Yin Zhang, Shiyu Zhao, Peidong Liu
{"title":"MBA-SLAM: Motion Blur Aware Dense Visual SLAM With Radiance Fields Representation.","authors":"Peng Wang, Lingzhe Zhao, Yin Zhang, Shiyu Zhao, Peidong Liu","doi":"10.1109/TPAMI.2025.3596976","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3596976","url":null,"abstract":"<p><p>Emerging 3D scene representations, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have demonstrated their effectiveness in Simultaneous Localization and Mapping (SLAM) for photo-realistic rendering, particularly when using high-quality video sequences as input. However, existing methods struggle with motion-blurred frames, which are common in real-world scenarios like low-light or long-exposure conditions. This often results in a significant reduction in both camera localization accuracy and map reconstruction quality. To address this challenge, we propose a dense visual SLAM pipeline (i.e. MBA-SLAM) to handle severe motion-blurred inputs. Our approach integrates an efficient motion blur-aware tracker with either neural radiance fields or Gaussian Splatting based mapper. By accurately modeling the physical image formation process of motion-blurred images, our method simultaneously learns 3D scene representation and estimates the cameras' local trajectory during exposure time, enabling proactive compensation for motion blur caused by camera movement. In our experiments, we demonstrate that MBA-SLAM surpasses previous state-of-the-art methods in both camera localization and map reconstruction, showcasing superior performance across a range of datasets, including synthetic and real datasets featuring sharp images as well as those affected by motion blur, highlighting the versatility and robustness of our approach.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144805498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md Rakibul Hasan, Pouria Behnoudfar, Dan MacKinlay, Thomas Poulet
{"title":"PC-SRGAN: Physically Consistent Super-Resolution Generative Adversarial Network for General Transient Simulations.","authors":"Md Rakibul Hasan, Pouria Behnoudfar, Dan MacKinlay, Thomas Poulet","doi":"10.1109/TPAMI.2025.3596647","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3596647","url":null,"abstract":"<p><p>Machine Learning, particularly Generative Adversarial Networks (GANs), has revolutionised Super-Resolution (SR). However, generated images often lack physical meaningfulness, which is essential for scientific applications. Our approach, PC-SRGAN, enhances image resolution while ensuring physical consistency for interpretable simulations. PC-SRGAN significantly improves both the Peak Signal-to-Noise Ratio and the Structural Similarity Index Measure compared to conventional SR methods, even with limited training data (e.g., only 13% of training data is required to achieve performance similar to SRGAN). Beyond SR, PC-SRGAN augments physically meaningful machine learning, incorporating numerically justified time integrators and advanced quality metrics. These advancements promise reliable and causal machine-learning models in scientific domains. A significant advantage of PC-SRGAN over conventional SR techniques is its physical consistency, which makes it a viable surrogate model for time-dependent problems. PC-SRGAN advances scientific machine learning by improving accuracy and efficiency, enhancing process understanding, and broadening applications to scientific research. We publicly release the complete source code of PC-SRGAN and all experiments at https://github.com/hasan-rakibul/PC-SRGAN.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144805499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Time Evidence Fusion Network: Multi-Source View in Long-Term Time Series Forecasting.","authors":"Tianxiang Zhan, Yuanpeng He, Yong Deng, Zhen Li, Wenjie Du, Qingsong Wen","doi":"10.1109/TPAMI.2025.3596905","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3596905","url":null,"abstract":"<p><p>In practical scenarios, time series forecasting necessitates not only accuracy but also efficiency. Consequently, the exploration of model architectures remains a perennially trending topic in research. To address these challenges, we propose a novel backbone architecture named Time Evidence Fusion Network (TEFN) from the perspective of information fusion. Specifically, we introduce the Basic Probability Assignment (BPA) Module based on evidence theory to capture the uncertainty of multivariate time series data from both channel and time dimensions. Additionally, we develop a novel multi-source information fusion method to effectively integrate the two distinct dimensions from BPA output, leading to improved forecasting accuracy. Lastly, we conduct extensive experiments to demonstrate that TEFN achieves performance comparable to state-of-the-art methods while maintaining significantly lower complexity and reduced training time. Also, our experiments show that TEFN exhibits high robustness, with minimal error fluctuations during hyperparameter selection. Furthermore, due to the fact that BPA is derived from fuzzy theory, TEFN offers a high degree of interpretability. Therefore, the proposed TEFN balances accuracy, efficiency, stability, and interpretability, making it a desirable solution for time series forecasting.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144805500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiangning Zhang;Teng Hu;Haoyang He;Zhucun Xue;Yabiao Wang;Chengjie Wang;Yong Liu;Xiangtai Li;Dacheng Tao
{"title":"EMOv2: Pushing 5M Vision Model Frontier","authors":"Jiangning Zhang;Teng Hu;Haoyang He;Zhucun Xue;Yabiao Wang;Chengjie Wang;Yong Liu;Xiangtai Li;Dacheng Tao","doi":"10.1109/TPAMI.2025.3596776","DOIUrl":"10.1109/TPAMI.2025.3596776","url":null,"abstract":"This work focuses on developing parameter-efficient and lightweight models for dense predictions while trading off parameters, FLOPs, and performance. Our goal is to set up the new frontier of the 5 M magnitude lightweight model on various downstream tasks. Inverted Residual Block (IRB) serves as the infrastructure for lightweight CNNs, but no counterparts have been recognized by attention-based design. Our work rethinks the lightweight infrastructure of efficient IRB and practical components in Transformer from a unified perspective, extending CNN-based IRB to attention-based models and abstracting a one-residual Meta Mobile Block (MMBlock) for lightweight model design. Following neat but effective design criterion, we deduce a modern <b>I</b>mproved <b>I</b>nverted <b>R</b>esidual <b>M</b>obile <b>B</b>lock (<b>i<inline-formula><tex-math>$^{2}$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic></alternatives></inline-formula>RMB</b>) and improve a hierarchical Efficient MOdel (<b>EMOv2</b>) with no elaborate complex structures. Considering the imperceptible latency for mobile users when downloading models under 4 G/5 G bandwidth and ensuring model performance, we investigate the performance upper limit of lightweight models with a magnitude of 5 M. Extensive experiments on various vision recognition, dense prediction, and image generation tasks demonstrate the superiority of our EMOv2 over state-of-the-art methods, e.g., EMOv2-1 M/2M/5 M achieve 72.3, 75.8, and 79.4 Top-1 that surpass equal-order CNN-/Attention-based models significantly. At the same time, EMOv2-5 M equipped RetinaNet achieves 41.5 mAP for object detection tasks that surpasses the previous EMO-5 M by +2.6<inline-formula><tex-math>$uparrow$</tex-math></inline-formula> . When employing the more robust training recipe, our EMOv2-5M eventually achieves 82.9 Top-1 accuracy, which elevates the performance of 5M magnitude models to a new level.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 11","pages":"10560-10576"},"PeriodicalIF":18.6,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144801325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuan Dong, Xiangyuan Sun, Xia Wang, Jian Song, Ya Li, Weixin Li
{"title":"Video Demoireing using Focused-Defocused Dual-Camera System.","authors":"Xuan Dong, Xiangyuan Sun, Xia Wang, Jian Song, Ya Li, Weixin Li","doi":"10.1109/TPAMI.2025.3596700","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3596700","url":null,"abstract":"<p><p>Moire patterns, unwanted color artifacts in images and videos, arise from the interference between spatially high-frequency scene contents and the spatial discrete sampling of digital cameras. Existing demoireing methods primarily rely on single-camera image/video processing, which faces two critical challenges: 1) distinguishing moire patterns from visually similar real textures, and 2) preserving tonal consistency and temporal coherence while removing moire artifacts. To address these issues, we propose a dual-camera framework that captures synchronized videos of the same scene: one in focus (retaining high-quality textures but may exhibit moire patterns) and one defocused (with significantly reduced moire patterns but blurred textures). We use the defocused video to help distinguish moire patterns from real texture, so as to guide the demoireing of the focused video. We propose a frame-wise demoireing pipeline, which begins with an optical flow based alignment step to address any discrepancies in displacement and occlusion between the focused and defocused frames. Then, we leverage the aligned defocused frame to guide the demoireing of the focused frame using a multi-scale CNN and a multi-dimensional training loss. To maintain tonal and temporal consistency, our final step involves a joint bilateral filter to leverage the demoireing result from the CNN as the guide to filter the input focused frame to obtain the final output. Experimental results demonstrate that our proposed framework largely outperforms state-of-the-art image and video demoireing methods.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144801326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cross-Spectral Analysis of Bivariate Graph Signals.","authors":"Kyusoon Kim, Hee-Seok Oh","doi":"10.1109/TPAMI.2025.3596918","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3596918","url":null,"abstract":"<p><p>With the advancements in technology and monitoring tools, we often encounter multivariate graph signals, which can be seen as the realizations of multivariate graph processes, and revealing the relationship between their constituent quantities is one of the important problems. To address this issue, we propose a cross-spectral analysis tool for bivariate graph signals. The main goal of this study is to extend the scope of spectral analysis of graph signals to bivariate graph signals. In this study, we define joint weak stationarity graph processes and introduce graph cross-spectral density and coherence for bivariate graph processes. We propose several estimators for the cross-spectral density and investigate the theoretical properties of the proposed estimators. Furthermore, we demonstrate the effectiveness of the proposed estimators through numerical experiments, including simulation studies and a real data application. Finally, as an interesting extension, we discuss robust spectral analysis of graph signals in the presence of outliers.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144801324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nan Yin;Li Shen;Mengzhu Wang;Xinwang Liu;Chong Chen;Xian-Sheng Hua
{"title":"DREAM: A Dual Variational Framework for Unsupervised Graph Domain Adaptation","authors":"Nan Yin;Li Shen;Mengzhu Wang;Xinwang Liu;Chong Chen;Xian-Sheng Hua","doi":"10.1109/TPAMI.2025.3596054","DOIUrl":"10.1109/TPAMI.2025.3596054","url":null,"abstract":"Graph classification has been a prominent problem in graph machine learning fields. This problem has been investigated by leveraging message passing neural networks (MPNNs) to learn powerful graph representations. However, MPNNs extract topological semantics implicitly under label supervision, which could suffer from domain shift and label scarcity in unsupervised domain adaptation settings. In this paper, we propose an effective solution named <underline>D</u>ual Va<underline>r</u>iational S<underline>e</u>mantics Gr<underline>a</u>ph <underline>M</u>ining (DREAM) for unsupervised graph domain adaptation by combining graph structural semantics from complementary perspectives. Besides a message passing branch to learn implicit semantics, our DREAM trains a path aggregation branch, which can provide explicit high-order structural semantics as a supplement. To train these two branches conjointly, we employ an expectation-maximization (EM) style variational framework for the maximization of likelihood. In the E-step, we fix the message passing branch and construct a graph-of-graph to indicate the geometric correlation between source and target domains, which would be adopted for the optimization of the other branch. In the M-step, we train the message passing branch and update the graph neural networks on the graph-of-graph with the other branch fixed. The alternative optimization improves the collaboration of knowledge from two branches. Extensive experiments on several benchmark datasets validate the superiority of the proposed DREAM compared with various baselines.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 11","pages":"10787-10800"},"PeriodicalIF":18.6,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144787055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yexiong Lin;Yu Yao;Zhaoqing Wang;Xu Shen;Jun Yu;Bo Han;Tongliang Liu
{"title":"Improving the Instance-Dependent Transition Matrix Estimation by Exploiting Self-Supervised Learning","authors":"Yexiong Lin;Yu Yao;Zhaoqing Wang;Xu Shen;Jun Yu;Bo Han;Tongliang Liu","doi":"10.1109/TPAMI.2025.3595613","DOIUrl":"10.1109/TPAMI.2025.3595613","url":null,"abstract":"The <italic>transition matrix</i> reveals the transition relationship between clean labels and noisy labels. It plays an important role in building statistically consistent classifiers for learning with noisy labels. However, in real-world applications, the transition matrix is usually unknown and has to be estimated. It is a challenging task to accurately estimate the transition matrix which usually depends on the instance. With both instances and noisy labels at hand, the major difficulty of estimating the transition matrix comes from the absence of clean label information. Recent work suggests that self-supervised learning methods can effectively infer clean label information. These methods could even achieve comparable performance with supervised learning on many benchmark datasets but without requiring any labels. Motivated by this, our paper presents a practical approach that harnesses self-supervised learning to extract clean label information, which reduces the estimation error of the instance-dependent transition matrix. By exploiting the estimated transition matrix, the performance of classifiers is improved. Empirical results on different datasets illustrate that our proposed methodology outperforms existing state-of-the-art methods in terms of both classification accuracy and transition matrix estimation.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 11","pages":"10848-10861"},"PeriodicalIF":18.6,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144778167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}