IEEE transactions on pattern analysis and machine intelligence最新文献_第5页

Dynamic Scene Understanding Through Object-Centric Voxelization and Neural Rendering 通过以对象为中心的体素化和神经渲染来理解动态场景

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-02-06 DOI: 10.1109/TPAMI.2025.3539866

Yanpeng Zhao;Yiwei Hao;Siyu Gao;Yunbo Wang;Xiaokang Yang

{"title":"Dynamic Scene Understanding Through Object-Centric Voxelization and Neural Rendering","authors":"Yanpeng Zhao;Yiwei Hao;Siyu Gao;Yunbo Wang;Xiaokang Yang","doi":"10.1109/TPAMI.2025.3539866","DOIUrl":"10.1109/TPAMI.2025.3539866","url":null,"abstract":"Learning object-centric representations from unsupervised videos is challenging. Unlike most previous approaches that focus on decomposing 2D images, we present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning within a differentiable volume rendering framework. The key idea is to perform object-centric voxelization to capture the 3D nature of the scene, which infers per-object occupancy probabilities at individual spatial locations. These voxel features evolve through a canonical-space deformation function and are optimized in an inverse rendering pipeline with a compositional NeRF. Additionally, our approach integrates 2D semantic features to create 3D semantic grids, representing the scene through multiple disentangled voxel grids. DynaVol-S significantly outperforms existing models in both novel view synthesis and unsupervised decomposition tasks for dynamic scenes. By jointly considering geometric structures and semantic features, it effectively addresses challenging real-world scenarios involving complex object interactions. Furthermore, once trained, the explicitly meaningful voxel features enable additional capabilities that 2D scene decomposition methods cannot achieve, such as novel scene generation through editing geometric shapes or manipulating the motion trajectories of objects.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 5","pages":"4215-4231"},"PeriodicalIF":0.0,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10877772","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143258496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Causality-Aware Paradigm for Evaluating Creativity of Multimodal Large Language Models 多模态大型语言模型创造性评价的因果意识范式

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-02-06 DOI: 10.1109/TPAMI.2025.3539433

Zhongzhan Huang;Shanshan Zhong;Pan Zhou;Shanghua Gao;Marinka Zitnik;Liang Lin

{"title":"A Causality-Aware Paradigm for Evaluating Creativity of Multimodal Large Language Models","authors":"Zhongzhan Huang;Shanshan Zhong;Pan Zhou;Shanghua Gao;Marinka Zitnik;Liang Lin","doi":"10.1109/TPAMI.2025.3539433","DOIUrl":"10.1109/TPAMI.2025.3539433","url":null,"abstract":"Recently, numerous benchmarks have been developed to evaluate the logical reasoning abilities of large language models (LLMs). However, assessing the equally important creative capabilities of LLMs is challenging due to the subjective, diverse, and data-scarce nature of creativity, especially in multimodal scenarios. In this paper, we consider the comprehensive pipeline for evaluating the creativity of multimodal LLMs, with a focus on suitable evaluation platforms and methodologies. First, we find the Oogiri game—a creativity-driven task requiring humor, associative thinking, and the ability to produce unexpected responses to text, images, or both. This game aligns well with the input-output structure of modern multimodal LLMs and benefits from a rich repository of high-quality, human-annotated creative responses, making it an ideal platform for studying LLM creativity. Next, beyond using the Oogiri game for standard evaluations like ranking and selection, we propose LoTbench, an interactive, causality-aware evaluation framework, to further address some intrinsic risks in standard evaluations, such as information leakage and limited interpretability. The proposed LoTbench not only quantifies LLM creativity more effectively but also visualizes the underlying creative thought processes. Our results show that while most LLMs exhibit constrained creativity, the performance gap between LLMs and humans is not insurmountable. Furthermore, we observe a strong correlation between results from the multimodal cognition benchmark MMMU and LoTbench, but only a weak connection with traditional creativity metrics. This suggests that LoTbench better aligns with human cognitive theories, highlighting cognition as a critical foundation in the early stages of creativity and enabling the bridging of diverse concepts. Project Page.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 5","pages":"3830-3846"},"PeriodicalIF":0.0,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143258462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Absorption-Based, Passive Range Imaging From Hyperspectral Thermal Measurements 高光谱热测量中基于吸收的被动距离成像

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-02-06 DOI: 10.1109/TPAMI.2025.3538711

Unay Dorken Gallastegi;Hoover Rueda-Chacón;Martin J. Stevens;Vivek K Goyal

{"title":"Absorption-Based, Passive Range Imaging From Hyperspectral Thermal Measurements","authors":"Unay Dorken Gallastegi;Hoover Rueda-Chacón;Martin J. Stevens;Vivek K Goyal","doi":"10.1109/TPAMI.2025.3538711","DOIUrl":"10.1109/TPAMI.2025.3538711","url":null,"abstract":"Passive hyperspectral longwave infrared measurements are remarkably informative about the surroundings. Remote object material and temperature determine the spectrum of thermal radiance, and range, air temperature, and gas concentrations determine how this spectrum is modified by propagation to the sensor. We introduce a passive range imaging method based on computationally separating these phenomena. Previous methods assume hot and highly emitting objects; ranging is more challenging when objects’ temperatures do not deviate greatly from air temperature. Our method jointly estimates range and intrinsic object properties, with explicit consideration of air emission, though reflected light is assumed negligible. Inversion being underdetermined is mitigated by using a parametric model of atmospheric absorption and regularizing for smooth emissivity estimates. To assess where our estimate is likely accurate, we introduce a technique to detect which scene pixels are significantly influenced by reflected downwelling. Monte Carlo simulations demonstrate the importance of regularization, temperature differentials, and availability of many spectral bands. We apply our method to longwave infrared (8–13 <inline-formula><tex-math>$mathrm{mu }mathrm{m}$</tex-math></inline-formula>) hyperspectral image data acquired from natural scenes with no active illumination. Range features from 15 m to 150 m are recovered, with good qualitative match to lidar data for pixels classified as having negligible reflected downwelling.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 5","pages":"4044-4060"},"PeriodicalIF":0.0,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143258463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Invertible Diffusion Models for Compressed Sensing 压缩感知的可逆扩散模型

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-02-05 DOI: 10.1109/TPAMI.2025.3538896

Bin Chen;Zhenyu Zhang;Weiqi Li;Chen Zhao;Jiwen Yu;Shijie Zhao;Jie Chen;Jian Zhang

{"title":"Invertible Diffusion Models for Compressed Sensing","authors":"Bin Chen;Zhenyu Zhang;Weiqi Li;Chen Zhao;Jiwen Yu;Shijie Zhao;Jie Chen;Jian Zhang","doi":"10.1109/TPAMI.2025.3538896","DOIUrl":"10.1109/TPAMI.2025.3538896","url":null,"abstract":"While deep neural networks (NNs) significantly advance image compressed sensing (CS) by improving reconstruction quality, the necessity of training current CS NNs from scratch constrains their effectiveness and hampers rapid deployment. Although recent methods utilize pre-trained diffusion models for image reconstruction, they struggle with slow inference and restricted adaptability to CS. To tackle these challenges, this paper proposes <bold>Invertible <bold>Diffusion <bold>Models (<bold>IDM), a novel efficient, end-to-end diffusion-based CS method. IDM repurposes a large-scale diffusion sampling process as a reconstruction model, and fine-tunes it end-to-end to recover original images directly from CS measurements, moving beyond the traditional paradigm of one-step noise estimation learning. To enable such memory-intensive end-to-end fine-tuning, we propose a novel two-level invertible design to transform both 1) multi-step sampling process and 2) noise estimation U-Net in each step into invertible networks. As a result, most intermediate features are cleared during training to reduce up to 93.8% GPU memory. In addition, we develop a set of lightweight modules to inject measurements into noise estimator to further facilitate reconstruction. Experiments demonstrate that IDM outperforms existing state-of-the-art CS networks by up to 2.64 dB in PSNR. Compared to the recent diffusion-based approach DDNM, our IDM achieves up to 10.09 dB PSNR gain and 14.54 times faster inference.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 5","pages":"3992-4006"},"PeriodicalIF":0.0,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143191824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Revisiting Flatness-Aware Optimization in Continual Learning With Orthogonal Gradient Projection 正交梯度投影持续学习中的平面感知优化问题重述

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-02-05 DOI: 10.1109/TPAMI.2025.3539019

Enneng Yang;Li Shen;Zhenyi Wang;Shiwei Liu;Guibing Guo;Xingwei Wang;Dacheng Tao

{"title":"Revisiting Flatness-Aware Optimization in Continual Learning With Orthogonal Gradient Projection","authors":"Enneng Yang;Li Shen;Zhenyi Wang;Shiwei Liu;Guibing Guo;Xingwei Wang;Dacheng Tao","doi":"10.1109/TPAMI.2025.3539019","DOIUrl":"10.1109/TPAMI.2025.3539019","url":null,"abstract":"The goal of continual learning (CL) is to learn from a series of continuously arriving new tasks without forgetting previously learned old tasks. To avoid catastrophic forgetting of old tasks, orthogonal gradient projection (OGP) based CL methods constrain the gradients of new tasks to be orthogonal to the space spanned by old tasks. This strict gradient constraint will limit the learning ability of new tasks, resulting in lower performance on new tasks. In this paper, we first establish a unified framework for OGP-based CL methods. We then revisit OGP-based CL methods from a new perspective on the loss landscape, where we find that when relaxing projection constraints to improve performance on new tasks, the unflatness of the loss landscape can lead to catastrophic forgetting of old tasks. Based on our findings, we propose a new Dual Flatness-aware OGD framework that optimizes the flatness of the loss landscape from both data and weight levels. Our framework consists of three modules: data and weight perturbation, flatness-aware optimization, and gradient projection. Specifically, we first perform perturbations on the task's data and current model weights to make the task's loss reach the worst-case. Next, we optimize the loss and loss landscape on the original data and the worst-case perturbed data to obtain a flatness-aware gradient. Finally, the flatness-aware gradient will update the network in directions orthogonal to the space spanned by the old tasks. Extensive experiments on four benchmark datasets show that the framework improves the flatness of the loss landscape and performance on new tasks, and achieves state-of-the-art (SOTA) performance on average accuracy across all tasks.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 5","pages":"3895-3907"},"PeriodicalIF":0.0,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143191826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fully-Connected Transformer for Multi-Source Image Fusion 用于多源图像融合的全连接变压器

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-02-05 DOI: 10.1109/TPAMI.2024.3523364

Xiao Wu;Zi-Han Cao;Ting-Zhu Huang;Liang-Jian Deng;Jocelyn Chanussot;Gemine Vivone

{"title":"Fully-Connected Transformer for Multi-Source Image Fusion","authors":"Xiao Wu;Zi-Han Cao;Ting-Zhu Huang;Liang-Jian Deng;Jocelyn Chanussot;Gemine Vivone","doi":"10.1109/TPAMI.2024.3523364","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3523364","url":null,"abstract":"Multi-source image fusion combines the information coming from multiple images into one data, thus improving imaging quality. This topic has aroused great interest in the community. How to integrate information from different sources is still a big challenge, although the existing self-attention based transformer methods can capture spatial and channel similarities. In this paper, we first discuss the mathematical concepts behind the proposed generalized self-attention mechanism, where the existing self-attentions are considered basic forms. The proposed mechanism employs multilinear algebra to drive the development of a novel fully-connected self-attention (FCSA) method to fully exploit local and non-local domain-specific correlations among multi-source images. Moreover, we propose a multi-source image representation embedding it into the FCSA framework as a non-local prior within an optimization problem. Some different fusion problems are unfolded into the proposed fully-connected transformer fusion network (FC-Former). More specifically, the concept of generalized self-attention can promote the potential development of self-attention. Hence, the FC-Former can be viewed as a network model unifying different fusion tasks. Compared with state-of-the-art methods, the proposed FC-Former method exhibits robust and superior performance, showing its capability of faithfully preserving information.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 3","pages":"2071-2088"},"PeriodicalIF":0.0,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fair Representation Learning for Continuous Sensitive Attributes Using Expectation of Integral Probability Metrics 基于积分概率度量期望的连续敏感属性公平表示学习

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-02-05 DOI: 10.1109/TPAMI.2025.3538915

Insung Kong;Kunwoong Kim;Yongdai Kim

{"title":"Fair Representation Learning for Continuous Sensitive Attributes Using Expectation of Integral Probability Metrics","authors":"Insung Kong;Kunwoong Kim;Yongdai Kim","doi":"10.1109/TPAMI.2025.3538915","DOIUrl":"10.1109/TPAMI.2025.3538915","url":null,"abstract":"AI fairness, also known as algorithmic fairness, aims to ensure that algorithms operate without bias or discrimination towards any individual or group. Among various AI algorithms, the Fair Representation Learning (FRL) approach has gained significant interest in recent years. However, existing FRL algorithms have a limitation: they are primarily designed for categorical sensitive attributes and thus cannot be applied to continuous sensitive attributes, such as age or income. In this paper, we propose an FRL algorithm for continuous sensitive attributes. First, we introduce a measure called the Expectation of Integral Probability Metrics (EIPM) to assess the fairness level of representation space for continuous sensitive attributes. We demonstrate that if the distribution of the representation has a low EIPM value, then any prediction head constructed on the top of the representation become fair, regardless of the selection of the prediction head. Furthermore, EIPM possesses a distinguished advantage in that it can be accurately estimated using our proposed estimator with finite samples. Based on these properties, we propose a new FRL algorithm called Fair Representation using EIPM with MMD (FREM). Experimental evidences show that FREM outperforms other baseline methods.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 5","pages":"3784-3795"},"PeriodicalIF":0.0,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143191825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Temporally-Consistent Surface Reconstruction Using Metrically-Consistent Atlases 使用度量一致地图集的时间一致表面重建

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-02-04 DOI: 10.1109/TPAMI.2025.3538776

Jan Bednarik;Noam Aigerman;Vladimir G. Kim;Siddhartha Chaudhuri;Shaifali Parashar;Mathieu Salzmann;Pascal Fua

引用次数: 0

GaussNav: Gaussian Splatting for Visual Navigation GaussNav：用于视觉导航的高斯飞溅

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-02-04 DOI: 10.1109/TPAMI.2025.3538496

Xiaohan Lei;Min Wang;Wengang Zhou;Houqiang Li

{"title":"GaussNav: Gaussian Splatting for Visual Navigation","authors":"Xiaohan Lei;Min Wang;Wengang Zhou;Houqiang Li","doi":"10.1109/TPAMI.2025.3538496","DOIUrl":"10.1109/TPAMI.2025.3538496","url":null,"abstract":"In embodied vision, Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment. The primary challenge of IIN arises from the need to recognize the target object across varying viewpoints while ignoring potential distractors. Existing map-based navigation methods typically use Bird’s Eye View (BEV) maps, which lack detailed texture representation of a scene. Consequently, while BEV maps are effective for semantic-level visual navigation, they are struggling for instance-level tasks. To this end, we propose a new framework for IIN, Gaussian Splatting for Visual Navigation (GaussNav), which constructs a novel map representation based on 3D Gaussian Splatting (3DGS). The GaussNav framework enables the agent to memorize both the geometry and semantic information of the scene, as well as retain the textural features of objects. By matching renderings of similar objects with the target, the agent can accurately identify, ground, and navigate to the specified object. Our GaussNav framework demonstrates a significant performance improvement, with Success weighted by Path Length (SPL) increasing from 0.347 to 0.578 on the challenging Habitat-Matterport 3D (HM3D) dataset.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 5","pages":"4108-4121"},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143125124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Quasi-Metric Learning for Bilateral Person-Job Fit 双边人职契合度的准度量学习

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-02-04 DOI: 10.1109/TPAMI.2025.3538774

Yingpeng Du;Hongzhi Liu;Hengshu Zhu;Yang Song;Zhi Zheng;Zhonghai Wu

{"title":"Quasi-Metric Learning for Bilateral Person-Job Fit","authors":"Yingpeng Du;Hongzhi Liu;Hengshu Zhu;Yang Song;Zhi Zheng;Zhonghai Wu","doi":"10.1109/TPAMI.2025.3538774","DOIUrl":"10.1109/TPAMI.2025.3538774","url":null,"abstract":"Matching suitable jobs with qualified candidates is crucial for online recruitment. Typically, users (i.e., candidates and employers) have specific expectations in the recruitment market, making them prefer similar jobs or candidates. Metric learning technologies provide a promising way to capture the similarity propagation between candidates and jobs. However, they rely on symmetric distance measures, failing to model users' asymmetric relationships in two-way selection. Additionally, users' behaviors (e.g., candidates) are highly affected by the feedback from their counterparts (e.g., employers), which can hardly be captured by the existing person-job fit methods that primarily explore homogeneous and undirected graphs. To address these problems, we propose a quasi-metric learning framework to capture the similarity propagation between candidates and jobs while modeling their asymmetric relations for bilateral person-job fit. Specifically, we propose a quasi-metric space that not only satisfies the triangle inequality to capture the fine-grained similarity between candidates and jobs, but also incorporates a tailored asymmetric measure to model users. two-way selection process in online recruitment. More importantly, the proposed quasi-metric learning framework can theoretically model recruitment rules from <italic>similarity and <italic>competitiveness perspectives, making it seamlessly align with bilateral person-job fit scenarios. To explore the mutual effects of two-sided users, we first organize candidates, employers, and their different-typed interactions into a heterogeneous relation graph, and then propose a relation-aware graph convolution network to capture users. mutual effects through their bilateral behaviors. Extensive experiments on several real-world datasets demonstrate the effectiveness of the proposed methods.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 5","pages":"3947-3960"},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143125027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0