IEEE Transactions on Pattern Analysis and Machine Intelligence最新文献_第8页

StyleShot: a Snapshot on any Style StyleShot：任何样式的快照

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-16 DOI: 10.1109/tpami.2025.3610614

Junyao Gao, Yanan Sun, Yanchen Liu, Yinhao Tang, Yanhong Zeng, Ding Qi, Kai Chen, Cairong Zhao

引用次数: 0

Efficient Nearest Neighbor Search Using Dynamic Programming 基于动态规划的高效最近邻搜索

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-16 DOI: 10.1109/tpami.2025.3610211

Pengfei Wang, Jiantao Song, Shiqing Xin, Shuangmin Chen, Changhe Tu, Wenping Wang, Jiaye Wang

引用次数: 0

MSFA Image Denoising Using Physics-based Noise Model and Noise-decoupled Network. 基于物理噪声模型和噪声解耦网络的MSFA图像去噪。

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-16 DOI: 10.1109/tpami.2025.3610243

Yuqi Jiang,Ying Fu,Qiankun Liu,Jun Zhang

{"title":"MSFA Image Denoising Using Physics-based Noise Model and Noise-decoupled Network.","authors":"Yuqi Jiang,Ying Fu,Qiankun Liu,Jun Zhang","doi":"10.1109/tpami.2025.3610243","DOIUrl":"https://doi.org/10.1109/tpami.2025.3610243","url":null,"abstract":"Multispectral filter array (MSFA) camera is increasingly used due to its compact size and fast capturing speed. However, because of its narrow-band property, it often suffers from the light-deficient problem, and images captured are easily overwhelmed by noise. As a type of commonly used denoising method, neural networks have shown their power to achieve satisfactory denoising results. However, their performance highly depends on high-quality noisy-clean image pairs. For the task of MSFA image denoising, there is currently neither a paired real dataset nor an accurate noise model capable of generating realistic noisy images. To this end, we present a physics-based noise model that is capable to match the real noise distribution and synthesize realistic noisy images. In our noise model, those different types of noise can be divided into SimpleDist component and ComplexDist component. The former contains all the types of noise that can be described using a simple probability distribution like Gaussian or Poisson distribution, and the latter contains the complicated color bias noise that cannot be modeled using a simple probability distribution. Besides, we design a noise-decoupled network consisting of a SimpleDist noise removal network (SNRNet) and a ComplexDist noise removal network (CNRNet) to sequentially remove each component. Moreover, according to the non-uniformity of color bias noise in our noise model, we introduce a learnable position embedding in CNRNet to indicate the position information. To verify the effectiveness of our physics-based noise model and noise-decoupled network, we collect a real MSFA denoising dataset with paired long-exposure clean images and short-exposure noisy images. Experiments are conducted to prove that the network trained using synthetic data generated by our noise model performs as well as trained using paired real data, and our noise-decoupled network outperforms other state-of-the-art denoising methods. The project page is avaliable at https://github.com/ying-fu/msfa denoising.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"17 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SNNTracker: Online High-speed Multi-Object Tracking with Spike Camera. SNNTracker：在线高速多目标跟踪与Spike相机。

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-16 DOI: 10.1109/tpami.2025.3610696

Yajing Zheng,Chengen Li,Jiyuan Zhang,Zhaofei Yu,Tiejun Huang

{"title":"SNNTracker: Online High-speed Multi-Object Tracking with Spike Camera.","authors":"Yajing Zheng,Chengen Li,Jiyuan Zhang,Zhaofei Yu,Tiejun Huang","doi":"10.1109/tpami.2025.3610696","DOIUrl":"https://doi.org/10.1109/tpami.2025.3610696","url":null,"abstract":"Multi-object tracking (MOT) is crucial for applications such as autonomous driving and robotics, yet traditional image-based methods struggle in high-speed scenarios due to motion blur and temporal gaps caused by low frame rates. Spike cameras, with their ability to continuously record spatiotemporal signals, overcome these limitations. However, existing spike-based methods often rely on intermediate image reconstruction or discrete clustering, which limits their real-time performance and temporal continuity. To address this, we propose SNNTracker, the first fully spiking neural network (SNN)-based MOT algorithm tailored for spike cameras. SNNTracker integrates a dynamic neural field (DNF)-based attention mechanism for target detection and a winner-take-all (WTA)-based tracking module with online spike-timing-dependent plasticity (STDP) for adaptive learning of object trajectories. By directly processing spike streams without reconstruction, SNNTracker reduces latency, computational overhead, and dependency on image quality, making it ideal for ultra-high-speed environments. It maintains robust, continuous tracking even under occlusions, severe lighting variations, or temporary object disappearance, by leveraging SNN-estimated motion predictions and long-term online clustering. We construct three types of spike-camera MOT datasets covering dense and sparse annotations across diverse real-world scenarios, including camera ego-motion, deformable and ultra-fast motion (up to 2600 RPM), occlusion, indoor/outdoor lighting changes, and low-visibility tracking. Extensive experiments demonstrate that SNNTracker consistently outperforms state-of-the-art MOT methods-both ANN- and SNN-based-achieving MOTA scores above 96% and up to 100% in many sequences. Our results highlight the advantages of spike-driven SNNs for low-latency, high-speed, and label-free multi-object tracking, advancing neuromorphic vision for real-time perception.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"171 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PlaneRecTR++: Unified Query Learning for Joint 3D Planar Reconstruction and Pose Estimation. PlaneRecTR++：用于关节三维平面重建和姿态估计的统一查询学习。

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-16 DOI: 10.1109/tpami.2025.3610500

Jingjia Shi,Shuaifeng Zhi,Kai Xu

{"title":"PlaneRecTR++: Unified Query Learning for Joint 3D Planar Reconstruction and Pose Estimation.","authors":"Jingjia Shi,Shuaifeng Zhi,Kai Xu","doi":"10.1109/tpami.2025.3610500","DOIUrl":"https://doi.org/10.1109/tpami.2025.3610500","url":null,"abstract":"3D plane reconstruction from images can usually be divided into several sub-tasks of plane detection, segmentation, parameters regression and possibly depth prediction for per-frame, along with plane correspondence and relative camera pose estimation between frames. Previous works tend to divide and conquer these sub-tasks with distinct network modules, overall formulated by a two-stage paradigm. With an initial camera pose and per-frame plane predictions provided from the first stage, exclusively designed modules, potentially relying on extra plane correspondence labelling, are applied to merge multi-view plane entities and produce 6DoF camera pose. As none of existing works manage to integrate above closely related sub-tasks into a unified framework but treat them separately and sequentially, we suspect it potentially as a main source of performance limitation for existing approaches. Motivated by this finding and the success of query-based learning in enriching reasoning among semantic entities, in this paper, we propose PlaneRecTR++, a Transformer-based architecture, which for the first time unifies all sub-tasks related to multi-view reconstruction and pose estimation with a compact single-stage model, refraining from initial pose estimation and plane correspondence supervision. Extensive quantitative and qualitative experiments demonstrate that our proposed unified learning achieves mutual benefits across sub-tasks, obtaining a new state-of-the-art performance on public ScanNetv1, ScanNetv2, NYUv2-Plane, and MatterPort3D datasets.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"9 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptive Batch Size Time Evolving Stochastic Gradient Descent for Federated Learning. 用于联邦学习的自适应批大小时间演化随机梯度下降。

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-15 DOI: 10.1109/tpami.2025.3610169

Xuming An,Li Shen,Yong Luo,Han Hu,Dacheng Tao

{"title":"Adaptive Batch Size Time Evolving Stochastic Gradient Descent for Federated Learning.","authors":"Xuming An,Li Shen,Yong Luo,Han Hu,Dacheng Tao","doi":"10.1109/tpami.2025.3610169","DOIUrl":"https://doi.org/10.1109/tpami.2025.3610169","url":null,"abstract":"Variance reduction has been shown to improve the performance of Stochastic Gradient Descent (SGD) in centralized machine learning. However, when it is extended to federated learning systems, many issues may arise, including (i) mega-batch size settings; (ii) additional noise introduced by the gradient difference between the current iteration and the snapshot point; and (iii) gradient (statistical) heterogeneity. In this paper, we propose a lightweight algorithm termed federated adaptive batch size time evolving variance reduction (FedATEVR) to tackle these issues, consisting of an adaptive batch size setting scheme and a time-evolving variance reduction gradient estimator. In particular, we use the historical gradient information to set an appropriate mega-batch size for each client, which can steadily accelerate the local SGD process and reduce the computation cost. The historical information involves both global and local gradient, which mitigates unstable varying in mega-batch size introduced by gradient heterogeneity among the clients. For each client, the gradient difference between the current iteration and the snapshot point is used to tune the time-evolving weight of the variance reduction term in the gradient estimator. This can avoid meaningless variance reduction caused by the out-of-date snapshot point gradient. We theoretically prove that our algorithm can achieve a linear speedup of of $mathcal {O}(frac{1}{sqrt{SKT}})$ for non-convex objective functions under partial client participation. Extensive experiments demonstrate that our proposed method can achieve higher test accuracy than the baselines and decrease communication rounds greatly.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"65 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145068472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DreamReward-X: Boosting High-Quality 3D Generation with Human Preference Alignment. DreamReward-X：促进高质量的3D生成与人类偏好对齐。

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-15 DOI: 10.1109/tpami.2025.3609680

Fangfu Liu,Junliang Ye,Yikai Wang,Hanyang Wang,Zhengyi Wang,Jun Zhu,Yueqi Duan

{"title":"DreamReward-X: Boosting High-Quality 3D Generation with Human Preference Alignment.","authors":"Fangfu Liu,Junliang Ye,Yikai Wang,Hanyang Wang,Zhengyi Wang,Jun Zhu,Yueqi Duan","doi":"10.1109/tpami.2025.3609680","DOIUrl":"https://doi.org/10.1109/tpami.2025.3609680","url":null,"abstract":"Recent advancements in 3D content generation have shown remarkable success by leveraging pretrained large-scale diffusion models. However, existing 3D generation results are far from perfect as one primary challenge lies in aligning 3D content with human preference, especially in text-driven 3D generation. In this paper, we propose a novel 3D generation framework, coined DreamReward, to learn and improve text-driven 3D generation models from human preference feedback. First, we collect 25K+ expert comparisons based on a systematic annotation pipeline including filtering, rating, and ranking. Then, we build Reward3D, the first general-purpose text-to-3D human preference reward model to encode human preferences effectively. Building upon the 3D reward model, we finally perform theoretical analysis and present the Reward3D Feedback Learning (DreamFL) algorithm to guide the noisy pretrained distribution toward the actual user-prompt distributions in optimization. With the rapid development and growing popularity of 4D and image-driven 3D generation, we further extend our DreamReward into 4D generation (DreamReward-4D) and image-to-3D generation (DreamReward-img) in a low-cost but effective manner. Despite the impressive results created by DreamReward, the diversity in text-driven 3D generation is limited due to inherent maximum likelihood-seeking issues. To address this, we explore the gap between Denoising Diffusion Implicit Models (DDIM) and SDS-based DreamFL in the generation process and propose DreamReward++, where we introduce a reward-aware noise sampling strategy to unleash text-driven diversity during the generation process while ensuring human preference alignment. Grounded by theoretical proof and extensive experiment comparisons, our method successfully generates high-fidelity and diverse 3D results with significant boosts in prompt alignment with human intention. Our results demonstrate the great potential for learning from human feedback to improve 3D generation.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"73 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145068470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Object Detection Data Synthesis via Box-to-Image Generation based on Diffusion Models. 基于扩散模型的盒像生成目标检测数据综合。

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-15 DOI: 10.1109/tpami.2025.3609962

Jingyuan Zhu,Huimin Ma,Jiansheng Chen,Jian Yuan

{"title":"Object Detection Data Synthesis via Box-to-Image Generation based on Diffusion Models.","authors":"Jingyuan Zhu,Huimin Ma,Jiansheng Chen,Jian Yuan","doi":"10.1109/tpami.2025.3609962","DOIUrl":"https://doi.org/10.1109/tpami.2025.3609962","url":null,"abstract":"Modern diffusion-based image generative models have made significant progress and become promising to enrich training data for the object detection task. However, the generation quality and the controllability for complex scenes containing multi-class objects and dense objects with occlusions remain limited. This paper presents ODGEN, a novel method to generate high-quality images conditioned on bounding boxes, thereby facilitating data synthesis for object detection. Given a domain-specific object detection dataset, we first fine-tune a pre-trained diffusion model on both cropped foreground objects and entire images to fit target distributions. Then we propose to control the diffusion model using synthesized visual prompts with spatial constraints and object-wise textual descriptions. ODGEN exhibits robustness in handling complex scenes and specific domains. Further, we design a dataset synthesis pipeline to evaluate ODGEN on 7 domain-specific benchmarks to demonstrate its effectiveness. Adding training data generated by ODGEN improves up to 25.3% mAP@.50:.95 with object detectors like YOLOv5 and YOLOv7, outperforming prior controllable generative methods. We also design an evaluation protocol based on COCO-2014 to validate the synthetic data of ODGEN in general domains and observe an advantage up to 5.6% in mAP@.50:.95 against existing methods. In addition, we employ a series of large-scale object detection datasets to train a general model named Stable Box Diffusion, which covers thousands of object categories in most common scenes.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"24 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145068471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

REST: Holistic Learning for End-to-End Semantic Segmentation of Whole-Scene Remote Sensing Imagery. REST：面向全场景遥感图像端到端语义分割的整体学习。

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-12 DOI: 10.1109/tpami.2025.3609767

Wei Chen,Lorenzo Bruzzone,Bo Dang,Yuan Gao,Youming Deng,Jin-Gang Yu,Liangqi Yuan,Yansheng Li

{"title":"REST: Holistic Learning for End-to-End Semantic Segmentation of Whole-Scene Remote Sensing Imagery.","authors":"Wei Chen,Lorenzo Bruzzone,Bo Dang,Yuan Gao,Youming Deng,Jin-Gang Yu,Liangqi Yuan,Yansheng Li","doi":"10.1109/tpami.2025.3609767","DOIUrl":"https://doi.org/10.1109/tpami.2025.3609767","url":null,"abstract":"Semantic segmentation of remote sensing imagery (RSI) is a fundamental task that aims at assigning a category label to each pixel. To pursue precise segmentation with one or more fine-grained categories, semantic segmentation often requires holistic segmentation of whole-scene RSI (WRI), which is normally characterized by a large size. However, conventional deep learning methods struggle to handle holistic segmentation of WRI due to the memory limitations of the graphics processing unit (GPU), thus requiring to adopt suboptimal strategies such as cropping or fusion, which result in performance degradation. Here, we introduce the Robust End-to-end semantic Segmentation architecture for whole-scene remoTe sensing imagery (REST). REST is the first intrinsically end-to-end framework for truly holistic segmentation of WRI, supporting a wide range of encoders and decoders in a plug-and-play fashion. It enables seamless integration with mainstream semantic segmentation methods, and even more advanced foundation models. Specifically, we propose a novel spatial parallel interaction mechanism (SPIM) within REST to overcome GPU memory constraints and achieve global context awareness. Unlike traditional parallel methods, SPIM enables REST to process a WRI effectively and efficiently by combining parallel computation with a divide-and-conquer strategy. Both theoretical analysis and experiments demonstrate that REST attains near-linear throughput scalability as additional GPUs are employed. Extensive experiments demonstrate that REST consistently outperforms existing cropping-based and fusion-based methods across a variety of scenarios, ranging from single-class to multi-class segmentation, from multispectral to hyperspectral imagery, and from satellite to drone platforms. The robustness and versatility of REST are expected to offer a promising solution for the holistic segmentation of WRI, with the potential for further extension to large-size medical imagery segmentation. The source code will be released at https://weichenrs.github.io/REST.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"73 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145043479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Single Voter Spreading for Efficient Correspondence Grouping and 3D Registration. 高效通信分组和三维注册的单投票人扩展。

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-12 DOI: 10.1109/tpami.2025.3609474

Siwen Quan,Zhao Zeng,Xiyu Zhang,Jiaqi Yang

{"title":"Single Voter Spreading for Efficient Correspondence Grouping and 3D Registration.","authors":"Siwen Quan,Zhao Zeng,Xiyu Zhang,Jiaqi Yang","doi":"10.1109/tpami.2025.3609474","DOIUrl":"https://doi.org/10.1109/tpami.2025.3609474","url":null,"abstract":"Obtaining highly consistent correspondences between point clouds is crucial for computer vision tasks such as 3D registration and recognition. Due to nuisances such as limited overlap and noise, initial correspondences often contain a large number of outliers, imposing a great challenge to downstream tasks. In this paper, we present a novel single voter spreading (SVOS) method for efficient 3D correspondence grouping and 3D registration. Our core insight is to leverage low-order graph constraints only in a single voter spreading voting scheme to achieve comparable constrain-ability as complex constraints without searching them. First, a simple first-order graph is constructed for the initial correspondence set. Second, a two-stage voting method is proposed, including single voter voting and spread voters voting. Each voting stage involves both local and global voting via edge constraints only. This promises good selectivity while making the voting process time- and storage-efficient. Finally, top-scored correspondences are opted for robust transformation estimation. Experiments on U3M, 3DMatch/3DLoMatch, ETH, and KITTI-LC datasets verify that SVOS achieves new state-of-the-art correspondence grouping and registration performance, while being light-weight and robust to graph construction parameters. The code will be available at https://github.com/ZhaoZeng-pro/SVOS.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"61 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145043481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0