{"title":"Surfel-based Gaussian Inverse Rendering for Fast and Relightable Dynamic Human Reconstruction from Monocular Videos.","authors":"Yiqun Zhao, Chenming Wu, Binbin Huang, Yihao Zhi, Chen Zhao, Jingdong Wang, Shenghua Gao","doi":"10.1109/TPAMI.2025.3599415","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3599415","url":null,"abstract":"<p><p>Efficient and accurate reconstruction of a relightable, dynamic clothed human avatar from a monocular video is crucial for the entertainment industry. This paper presents SGIA (Surfel-based Gaussian Inverse Avatar), which introduces efficient training and rendering for relightable dynamic human reconstruction. SGIA advances previous Gaussian Avatar methods by comprehensively modeling Physically-Based Rendering (PBR) properties for clothed human avatars, allowing for the manipulation of avatars into novel poses under diverse lighting conditions. Specifically, our approach integrates pre-integration and image-based lighting for fast light calculations that surpass the performance of existing implicit-based techniques. To address challenges related to material lighting disentanglement and accurate geometry reconstruction, we propose an innovative occlusion approximation strategy and a progressive training approach. Extensive experiments demonstrate that SGIA not only achieves highly accurate physical properties but also significantly enhances the realistic relighting of dynamic human avatars, providing a substantial speed advantage. We exhibit more results in our project page: https://GS-IA.github.io.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144857254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wengang Zhou, Weichao Zhao, Hezhen Hu, Zecheng Li, Houqiang Li
{"title":"Scaling up Multimodal Pre-Training for Sign Language Understanding.","authors":"Wengang Zhou, Weichao Zhao, Hezhen Hu, Zecheng Li, Houqiang Li","doi":"10.1109/TPAMI.2025.3599313","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3599313","url":null,"abstract":"<p><p>Sign language pre-training (SLP) has significantly improved the performance of diverse sign language understanding (SLU) tasks. However, many existing methods employ pre-training techniques that are tailored to a specific task with small data scale, resulting in limited model generalization. Some others focus solely on exploring visual cues, neglecting semantically textual cues embedded in sign translation texts. These limitations inherently diminish the representative capacity of pre-trained models. To this end, we present a multimodal SLP framework to leverage rich visual contextual information and vision-language semantic consistency with massively available data to enhance the representative capability of sign language video. Specifically, we first curate a large-scale text-labeled sign pose dataset ($sim$ 1.5M), namely SL-1.5M, from various sources to alleviate the scarcity of pre-training data. Subsequently, we propose a pre-training framework, which integrates sign-text contrastive learning with masked pose modeling as the pretext task. In this way, our framework is empowered to effectively capture contextual cues within sign pose sequences and learn visual representation by aligning semantical text-rich features in a latent space. Moreover, in order to grasp the comprehensive meaning of sign language videos, we concurrently model manual and non-manual information to ensure the holistic integrity of visual content. To validate the generalization and superiority of our proposed pre-trained framework, we conduct extensive experiments without intricate design on diverse SLU tasks, achieving new state-of-the-art performance on multiple benchmarks.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144857251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dark Noise Diffusion: Noise Synthesis for Low-Light Image Denoising.","authors":"Liying Lu, Raphael Achddou, Sabine Susstrunk","doi":"10.1109/TPAMI.2025.3598330","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3598330","url":null,"abstract":"<p><p>Low-light photography produces images with low signal-to-noise ratios due to limited photons. In such conditions, common approximations like the Gaussian noise model fall short, and many denoising techniques fail to remove noise effectively. Although deep-learning methods perform well, they require large datasets of paired images that are impractical to acquire. As a remedy, synthesizing realistic low-light noise has gained significant attention. In this paper, we investigate the ability of diffusion models to capture the complex distribution of low-light noise. We show that a naive application of conventional diffusion models is inadequate for this task and propose three key adaptations that enable high-precision noise generation: a two-branch architecture to better model signal-dependent and signal-independent noise, the incorporation of positional information to capture fixed-pattern noise, and a tailored diffusion noise schedule. Consequently, our model enables the generation of large datasets for training low-light denoising networks, leading to state-of-the-art performance. Through comprehensive analysis, including statistical evaluation and noise decomposition, we provide deeper insights into the characteristics of the generated data.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144850106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Taesung Kwon, Gookho Song, Yoosun Kim, Jeongsol Kim, Jong Chul Ye, Mooseok Jang
{"title":"Video Diffusion Posterior Sampling for Seeing Beyond Dynamic Scattering Layers.","authors":"Taesung Kwon, Gookho Song, Yoosun Kim, Jeongsol Kim, Jong Chul Ye, Mooseok Jang","doi":"10.1109/TPAMI.2025.3598457","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3598457","url":null,"abstract":"<p><p>Imaging through scattering is challenging, as even a thin layer can randomly perturb light propagation and obscure hidden objects. Accurate closed-form modeling of forward scattering remains difficult, particularly for dynamically varying or thick layers. Here, we introduce a plug-and-play inverse solver based on video diffusion models with a physically grounded forward model tailored to dynamic scattering layers. Our method extends Diffusion Posterior Sampling (DPS) to the spatio-temporal domain, thereby capturing statistical correlations between video frames and scattered signals more effectively. Leveraging these temporal correlations, our approach recovers high-resolution spatial details that spatial-only methods typically fail to reconstruct. We also propose an inference-time optimization with a lightweight mapping network, enabling joint estimation of low-dimensional forward-model parameters without additional training. This joint optimization significantly enhances adaptability to unknown, time-varying degradations, making our method suitable for blind inverse scattering problems. We validate across diverse conditions, including different scene types, layer thicknesses, and scene-layer distances. And real-world experiments using multiple datasets confirm the robustness and effectiveness of our approach, even under real noise and forward-model approximation mismatches. Finally, we validate our method as a general video-restoration framework across dehazing, deblurring, inpainting, and blind restoration under complex optical aberrations. Our implementation is available at: https://github.com/star-kwon/VDPS.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144850177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingxiang Sun, Cheng Peng, Ruizhi Shao, Yuan-Chen Guo, Xiaochen Zhao, Yangguang Li, YanPei Cao, Bo Zhang, Yebin Liu
{"title":"DreamCraft3D++: Efficient Hierarchical 3D Generation with Multi-Plane Reconstruction Model.","authors":"Jingxiang Sun, Cheng Peng, Ruizhi Shao, Yuan-Chen Guo, Xiaochen Zhao, Yangguang Li, YanPei Cao, Bo Zhang, Yebin Liu","doi":"10.1109/TPAMI.2025.3598772","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3598772","url":null,"abstract":"<p><p>We introduce DreamCraft3D++, an extension of DreamCraft3D that enables efficient high-quality generation of complex 3D assets. DreamCraft3D++ inherits the multi-stage generation process of DreamCraft3D, but replaces the time-consuming geometry sculpting optimization with a feed-forward multi-plane based reconstruction model, speeding up the process by 1000x. For texture refinement, we propose a training-free IP-Adapter module that is conditioned on the enhanced multi-view images to enhance texture and geometry consistency, providing a 4x faster alternative to DreamCraft3D's DreamBooth fine-tuning. Experiments on diverse datasets demonstrate DreamCraft3D++'s ability to generate creative 3D assets with intricate geometry and realistic 360° textures, outperforming state-of-the-art image-to-3D methods in quality and speed. The full implementation will be open-sourced to enable new possibilities in 3D content creation.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144850108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhanfeng Liao, Yuelang Xu, Zhe Li, Qijing Li, Boyao Zhou, Ruifeng Bai, Di Xu, Hongwen Zhang, Yebin Liu
{"title":"HHAvatar: Gaussian Head Avatar with Dynamic Hairs.","authors":"Zhanfeng Liao, Yuelang Xu, Zhe Li, Qijing Li, Boyao Zhou, Ruifeng Bai, Di Xu, Hongwen Zhang, Yebin Liu","doi":"10.1109/TPAMI.2025.3597940","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3597940","url":null,"abstract":"<p><p>Creating high-fidelity 3D head avatars has always been a research hotspot, but it remains a great challenge under lightweight sparse view setups. In this paper, we propose HHAvatar represented by controllable 3D Gaussians for high-fidelity head avatar with dynamic hair modeling. We first use 3D Gaussians to represent the appearance of the head, and then jointly optimize neutral 3D Gaussians and a fully learned MLP-based deformation field to capture complex expressions. The two parts benefit each other, thereby our method can model fine-grained dynamic details while ensuring expression accuracy. Furthermore, we devise a well-designed geometry-guided initialization strategy based on implicit SDF and Deep Marching Tetrahedra for the stability and convergence of the training procedure. To address the problem of dynamic hair modeling, we introduce a hybrid head model into our avatar representation based Gaussian Head Avatar and a training method that considers timing information and an occlusion perception module to model the non-rigid motion of hair. Experiments show that our approach outperforms other state-of-the-art sparse-view methods, achieving ultra high-fidelity rendering quality at 2K resolution even under exaggerated expressions and driving hairs reasonably with the motion of the head. Project page: https://liaozhanfeng.github.io/HHAvatar.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144850110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying Semantic Component for Robust Molecular Property Prediction.","authors":"Zijian Li, Zunhong Xu, Ruichu Cai, Zhenhui Yang, Yuguang Yan, Zhifeng Hao, Guangyi Chen, Kun Zhang","doi":"10.1109/TPAMI.2025.3598461","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3598461","url":null,"abstract":"<p><p>Although graph neural networks have achieved great success in the task of molecular property prediction in recent years, their generalization ability under out-of-distribution (OOD) settings is still under-explored. Most of the existing methods rely on learning discriminative representations for prediction, often assuming that the underlying semantic components are correctly identified. However, this assumption does not always hold, leading to potential misidentifications that affect model robustness. Different from these discriminative-based methods, we propose a generative model to ensure the Semantic-Components Identifiability, named SCI. We demonstrate that the latent variables in this generative model can be explicitly identified into semantic-relevant (SR) and semantic-irrelevant (SI) components, which contributes to better OOD generalization by involving minimal change properties of causal mechanisms. Specifically, we first formulate the data generation process from the atom level to the molecular level, where the latent space is split into SI substructures, SR substructures, and SR atom variables. Sequentially, to reduce misidentification, we restrict the minimal changes of the SR atom variables and add a semantic latent substructure regularization to mitigate the variance of the SR substructure under augmented domain changes. Under mild assumptions, we prove the block-wise identifiability of the SR substructure and the comment-wise identifiability of SR atom variables. Experimental studies achieve state-of-the-art performance and show general improvement on 21 datasets in 3 mainstream benchmarks. Moreover, the visualization results of the proposed SCI method provide insightful case studies and explanations for the prediction results.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144850111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lixin Yang;Licheng Zhong;Pengxiang Zhu;Xinyu Zhan;Junxiao Kong;Jian Xu;Cewu Lu
{"title":"Multi-View Hand Reconstruction With a Point-Embedded Transformer","authors":"Lixin Yang;Licheng Zhong;Pengxiang Zhu;Xinyu Zhan;Junxiao Kong;Jian Xu;Cewu Lu","doi":"10.1109/TPAMI.2025.3598089","DOIUrl":"10.1109/TPAMI.2025.3598089","url":null,"abstract":"This work introduces a novel and generalizable multi-view Hand Mesh Reconstruction (HMR) model, named POEM, designed for practical use in real-world hand motion capture scenarios. The advances of the POEM model consist of two main aspects. First, concerning the modeling of the problem, we propose embedding a static basis point within the multi-view stereo space. A point represents a natural form of 3D information and serves as an ideal medium for fusing features across different views, given its varied projections across these views. Consequently, our method harnesses a simple yet effective idea: a complex 3D hand mesh can be represented by a set of 3D basis points that 1) are embedded in the multi-view stereo, 2) carry features from the multi-view images, and 3) encompass the hand in it. The second advance lies in the training strategy. We utilize a combination of five large-scale multi-view datasets and employ randomization in the number, order, and poses of the cameras. By processing such a vast amount of data and a diverse array of camera configurations, our model demonstrates notable generalizability in the real-world applications. As a result, POEM presents a highly practical, plug-and-play solution that enables user-friendly, cost-effective multi-view motion capture for both left and right hands.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 11","pages":"10680-10695"},"PeriodicalIF":18.6,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144850112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seth Lindgren, Benjamin R Johnson, Lucas J Koerner
{"title":"Depth Dynamics via One-Bit Frequency Probing in Embedded Direct Time-of-Flight Sensing.","authors":"Seth Lindgren, Benjamin R Johnson, Lucas J Koerner","doi":"10.1109/TPAMI.2025.3598593","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3598593","url":null,"abstract":"<p><p>Time-of-flight (ToF) sensors with single-photon avalanche diodes (SPADs) estimate depth by accumulating a histogram of photon return times, which discards the timing information required to measure depth dynamics, such as vibrations or transient motions. We introduce a method that transforms a direct ToF sensor into a depth frequency analyzer capable of measuring high-frequency motion and transient events using only lightweight, on-sensor computations. By replacing conventional discrete Fourier transforms (DFTs) with one-bit probing sinusoids generated via oversampled sigma-delta modulation, we enable in-pixel frequency analysis without multipliers or floating-point operations. We extend the lightweight analysis of depth dynamics to Haar wavelets for time-localized detection of brief, non-repetitive depth changes. We validate our approach through simulation and hardware experiments, showing that it achieves noise performance approaching that of full-resolution DFTs, detects sub-millimeter motions above 6 kHz, and localizes millisecond-scale transients. Using a laboratory ToF setup, we demonstrate applications in oscillatory motion analysis and depth edge detection. This work has the potential to enable a new class of compact, motion-aware ToF sensors for embedded deployment in industrial predictive maintenance, structural health monitoring, robotic perception, and dynamic scene understanding.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144850107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SPARE: Symmetrized Point-to-Plane Distance for Robust Non-Rigid 3D Registration.","authors":"Yuxin Yao, Bailin Deng, Junhui Hou, Juyong Zhang","doi":"10.1109/TPAMI.2025.3598630","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3598630","url":null,"abstract":"<p><p>Existing optimization-based methods for non-rigid registration typically minimize an alignment error metric based on the point-to-point or point-to-plane distance between corresponding point pairs on the source surface and target surface. However, these metrics can result in slow convergence or a loss of detail. In this paper, we propose SPARE, a novel formulation that utilizes a symmetrized point-to-plane distance for robust non-rigid registration. The symmetrized point-to-plane distance relies on both the positions and normals of the corresponding points, resulting in a more accurate approximation of the underlying geometry and can achieve higher accuracy than existing methods. To solve this optimization problem efficiently, we introduce an as-rigid-as-possible regulation term to estimate the deformed normals and propose an alternating minimization solver using a majorization-minimization strategy. Moreover, for effective initialization of the solver, we incorporate a deformation graph-based coarse alignment that improves registration quality and efficiency. Extensive experiments show that the proposed method greatly improves the accuracy of non-rigid registration problems and maintains relatively high solution efficiency. The code is publicly available at https://github.com/yaoyx689/spare.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144850175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}