IEEE Transactions on Circuits and Systems for Video Technology最新文献

筛选
英文 中文
StreetSurfGS: Scalable Urban Street Surface Reconstruction With Planar-Based Gaussian Splatting StreetSurfGS:基于平面高斯溅射的可扩展城市街道表面重建
IF 11.1 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-17 DOI: 10.1109/TCSVT.2025.3551719
Xiao Cui;Weicai Ye;Yifan Wang;Guofeng Zhang;Wengang Zhou;Tong He;Houqiang Li
{"title":"StreetSurfGS: Scalable Urban Street Surface Reconstruction With Planar-Based Gaussian Splatting","authors":"Xiao Cui;Weicai Ye;Yifan Wang;Guofeng Zhang;Wengang Zhou;Tong He;Houqiang Li","doi":"10.1109/TCSVT.2025.3551719","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3551719","url":null,"abstract":"Reconstructing urban street scenes is crucial due to its vital role in applications such as autonomous driving and urban planning. These scenes are characterized by long, narrow camera trajectories, occlusion, complex object relationships, and sparse data across multiple scales. Despite recent advancements, existing surface reconstruction methods, which are primarily designed for object-centric scenarios, struggle to adapt effectively to the unique characteristics of street scenes. To address this challenge, we introduce StreetSurfGS, the first method to employ Gaussian Splatting specifically tailored for scalable urban street scene surface reconstruction. StreetSurfGS utilizes a planar-based octree representation and segmented training to reduce memory costs, accommodate unique camera characteristics, and improve scalability. Additionally, to mitigate depth inaccuracies caused by object overlap, we propose a guided smoothing strategy within regularization to eliminate inaccurate boundary points and outliers. Furthermore, to address sparse views and multi-scale challenges, we use a dual-step matching strategy that leverages adjacent and long-term information. Extensive experiments validate the efficacy of StreetSurfGS in both novel view synthesis and surface reconstruction.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8780-8793"},"PeriodicalIF":11.1,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Task Guided No-Reference Omnidirectional Image Quality Assessment With Feature Interaction 基于特征交互的多任务引导无参考全方位图像质量评估
IF 11.1 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-17 DOI: 10.1109/TCSVT.2025.3551723
Yun Liu;Sifan Li;Huiyu Duan;Yu Zhou;Daoxin Fan;Guangtao Zhai
{"title":"Multi-Task Guided No-Reference Omnidirectional Image Quality Assessment With Feature Interaction","authors":"Yun Liu;Sifan Li;Huiyu Duan;Yu Zhou;Daoxin Fan;Guangtao Zhai","doi":"10.1109/TCSVT.2025.3551723","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3551723","url":null,"abstract":"Omnidirectional image quality assessment (OIQA) has become an increasingly vital problem in recent years. Most previous no-reference OIQA methods only extract local features from the distorted viewports, or extract global features from the entire distorted image, lacking the interaction and fusion between local and global features. Moreover, the lack of reference information also limits their performance. Thus, we propose a no-reference OIQA model which consists of three novel modules, including a bidirectional pseudo-reference module, a Mamba-based global feature extraction module, and a multi-scale local-global feature aggregation module. Specifically, by considering the image distortion degradation process, a bidirectional pseudo-reference module capturing the error maps on viewports is first constructed to refine the multi-scale local visual features, which can supply rich quality degradation reference information without the reference image. To well complement the local features, the VMamba module is adopted to extract the representative multi-scale global visual features. Inspired by human hierarchical visual perception characteristics, a novel multi-scale aggregation module is built to strengthen the feature interaction and effective fusion which can extract deep semantic information. Finally, motivated by the multi-task managing mechanism of human brain, a multi-task learning module is introduced to assist the main quality assessment task by digging the hidden information in compression type and distortion degree. Extensive experimental results demonstrate that our proposed method achieves the state-of-the-art performance on the no-reference OIQA task compared to other models.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8794-8806"},"PeriodicalIF":11.1,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LOCAT: Localization-Driven Text Watermarking via Large Language Models LOCAT:基于大型语言模型的本地化驱动文本水印
IF 11.1 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-16 DOI: 10.1109/TCSVT.2025.3570858
Liang Ding;Xi Yang;Yang Yang;Weiming Zhang
{"title":"LOCAT: Localization-Driven Text Watermarking via Large Language Models","authors":"Liang Ding;Xi Yang;Yang Yang;Weiming Zhang","doi":"10.1109/TCSVT.2025.3570858","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3570858","url":null,"abstract":"The rapid advancement of large language models (LLMs) has raised concerns regarding potential misuse and underscores the importance of verifying text authenticity. Text watermarking, which embeds covert identifiers into generated content, offers a viable means for such verification. Such watermarking can be implemented either by modifying the generation process of an LLM or via post-processing techniques like lexical substitution, with the latter being particularly valuable when access to model parameters is restricted. However, existing lexical substitution-based methods often face a trade-off between maintaining text quality and ensuring robust watermarking. Addressing this limitation, our work focuses on enhancing both the robustness and imperceptibility of text watermarks within the lexical substitution paradigm. We propose a localization-based watermarking method that enhances robustness while maintaining text naturalness. First, a precise localization module identifies optimal substitution targets. Then, we leverage LLMs to generate contextually appropriate synonyms, and the watermark is embedded through binary-encoded substitutions. To address different usage scenarios, we focus on the trade-off between watermark robustness and text quality. Compared to existing methods, our approach significantly enhances watermark robustness while maintaining comparable text quality and achieves similar robustness levels while improving text quality. Even under severe semantic distortions, including word deletion, synonym substitution, polishing, and re-translation, the watermark remains detectable.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 8","pages":"8406-8420"},"PeriodicalIF":11.1,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144781943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Errata to “Local-Global Temporal Difference Learning for Satellite Video Super-Resolution” “卫星视频超分辨率局部-全局时间差分学习”的勘误表
IF 11.1 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-16 DOI: 10.1109/TCSVT.2025.3570842
Yi Xiao;Qiangqiang Yuan
{"title":"Errata to “Local-Global Temporal Difference Learning for Satellite Video Super-Resolution”","authors":"Yi Xiao;Qiangqiang Yuan","doi":"10.1109/TCSVT.2025.3570842","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3570842","url":null,"abstract":"In the above article [1], there is a citation error related to the core technical foundation of the proposed method. Reference [2] was incorrectly cited. The correct citation is [3].","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 10","pages":"10612-10612"},"PeriodicalIF":11.1,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11006141","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CamStegNet: A Robust Image Steganography Method Based on Camouflage Model CamStegNet:一种基于伪装模型的鲁棒图像隐写方法
IF 11.1 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-16 DOI: 10.1109/TCSVT.2025.3570725
Le Mao;Yun Tan;Jiaohua Qin;Xuyu Xiang
{"title":"CamStegNet: A Robust Image Steganography Method Based on Camouflage Model","authors":"Le Mao;Yun Tan;Jiaohua Qin;Xuyu Xiang","doi":"10.1109/TCSVT.2025.3570725","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3570725","url":null,"abstract":"Deep learning models are increasingly being employed in steganographic schemes for the embedding and extraction of secret information. However, steganographic models themselves are also at risk of detection and attacks. Although there are approaches proposed to hide deep learning models, making these models difficult to detect while achieving high-quality image steganography performance remains a challenging task. In this work, a robust image steganography method based on a camouflage model CamStegNet is proposed. The steganographic model is camouflaged as a routine deep learning model to significantly enhance its concealment. A sparse weight-filling paradigm is designed to enable the model to be flexibly switched among three modes by utilizing different keys: routine machine learning task, secret embedding task and secret recovery task. Furthermore, a residual state-space module and a neighborhood attention mechanism are constructed to improve the performance of image steganography. Experiments conducted on the DIV2K, ImageNet and COCO datasets demonstrate that the stego images generated by CamStegNet are superior to existing methods in terms of visual quality. They also exhibit enhanced resistance to steganalysis and maintain over 95% robustness against noise and scale attacks. Additionally, the model demonstrates high robustness which can achieve excellent performance in machine learning tasks and maintain stability across various weight initialization methods.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 10","pages":"10599-10611"},"PeriodicalIF":11.1,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Open-Set Mixed Domain Adaptation via Visual-Linguistic Focal Evolving 基于视觉语言焦点进化的开集混合域自适应
IF 11.1 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-14 DOI: 10.1109/TCSVT.2025.3551234
Bangzhen Liu;Yangyang Xu;Cheng Xu;Xuemiao Xu;Shengfeng He
{"title":"Open-Set Mixed Domain Adaptation via Visual-Linguistic Focal Evolving","authors":"Bangzhen Liu;Yangyang Xu;Cheng Xu;Xuemiao Xu;Shengfeng He","doi":"10.1109/TCSVT.2025.3551234","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3551234","url":null,"abstract":"We introduce a new task, Open-set Mixed Domain Adaptation (OSMDA), which considers the potential mixture of multiple distributions in the target domains, thereby better simulating real-world scenarios. To tackle the semantic ambiguity arising from multiple domains, our key idea is that the linguistic representation can serve as a universal descriptor for samples of the same category across various domains. We thus propose a more practical framework for cross-domain recognition via visual-linguistic guidance. On the other hand, the presence of multiple domains also poses a new challenge in classifying both known and unknown categories. To combat this issue, we further introduce a visual-linguistic focal evolving approach to gradually enhance the classification ability of a known/unknown binary classifier from two aspects. Specifically, we start with identifying highly confident focal samples to expand the pool of known samples by incorporating those from different domains. Then, we amplify the feature discrepancy between known and unknown samples through dynamic entropy evolving via an adaptive entropies min/max game, enabling us to accurately identify possible unknown samples in a gradual manner. Extensive experiments demonstrate our method’s superiority against the state-of-the-arts in both open-set and open-set mixed domain adaptation.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8495-8507"},"PeriodicalIF":11.1,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rate-Distortion-Optimized Deep Preprocessing for JPEG Compression 率失真优化的JPEG压缩深度预处理
IF 11.1 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-14 DOI: 10.1109/TCSVT.2025.3550872
Fan Ye;Bojun Liu;Li Li;Dong Liu
{"title":"Rate-Distortion-Optimized Deep Preprocessing for JPEG Compression","authors":"Fan Ye;Bojun Liu;Li Li;Dong Liu","doi":"10.1109/TCSVT.2025.3550872","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3550872","url":null,"abstract":"JPEG is daily used for compressing natural images, while the compressed images often contain visually annoying artifacts especially at low rates. To reduce the compression artifacts, it has been proposed to preprocess an image before the JPEG compression with the help of deep learning, which maintains the standard compliance. However, the existing methods were not fully justified from the rate-distortion optimization perspective. We address this limitation and propose a truly rate-distortion-optimized deep preprocessing method for JPEG compression. We decompose a rate-distortion cost into three parts: rate, distortion, and Lagrangian multiplier. First, we design a rate estimation network and propose to train the network to estimate the JPEG compression rate. Second, we propose to estimate the actual end-to-end distortion (between original and reconstructed images) with a differentiable JPEG simulator, where we specifically design an adaptive discrete cosine transform (DCT) domain masking algorithm. Third, we propose to estimate the actual content-dependent Lagrangian multipliers to combine rate and distortion into a joint loss function that drives the training of the preprocessing network. Our method makes no change to the JPEG encoder and decoder and supports any differentiable distortion measure (e.g. MSE, MS-SSIM, LPIPS). On the Kodak dataset, our method achieves on average 7.59% BD-rate reduction compared to the JPEG baseline when using MSE. With per-image optimization for LPIPS, our method achieves as high as 38.65% BD-rate reduction, and produces high-quality reconstructed images with much less artifacts.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 8","pages":"8330-8343"},"PeriodicalIF":11.1,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144781986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Class-Aware Prompting for Federated Few-Shot Class-Incremental Learning 联邦少次类增量学习的类感知提示
IF 11.1 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-14 DOI: 10.1109/TCSVT.2025.3551612
Fang-Yi Liang;Yu-Wei Zhan;Jiale Liu;Chong-Yu Zhang;Zhen-Duo Chen;Xin Luo;Xin-Shun Xu
{"title":"Class-Aware Prompting for Federated Few-Shot Class-Incremental Learning","authors":"Fang-Yi Liang;Yu-Wei Zhan;Jiale Liu;Chong-Yu Zhang;Zhen-Duo Chen;Xin Luo;Xin-Shun Xu","doi":"10.1109/TCSVT.2025.3551612","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3551612","url":null,"abstract":"Few-Shot Class-Incremental Learning (FSCIL) aims to continuously learn new classes from limited samples while preventing catastrophic forgetting. With the increasing distribution of learning data across different clients and privacy concerns, FSCIL faces a more realistic scenario where few learning samples are distributed across different clients, thereby necessitating a Federated Few-Shot Class-Incremental Learning (FedFSCIL) scenario. However, this integration faces challenges from non-IID problem, which affects model generalization and training efficiency. The communication overhead in federated settings also presents a significant challenge. To address these issues, we propose Class-Aware Prompting for Federated Few-Shot Class-Incremental Learning (FedCAP). Our framework leverages pre-trained models enhanced by a class-wise prompt pool, where shared class-wise keys enable clients to utilize global class information during training. This unifies the understanding of base class features across clients and enhances model consistency. We further incorporate a class-level information fusion module to improve class representation and model generalization. Our approach requires very few parameter transmission during model aggregation, ensuring communication efficiency. To our knowledge, this is the first study to explore the scenario of FedFSCIL. Consequently, we designed comprehensive experimental setups and made the code publicly available.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8520-8532"},"PeriodicalIF":11.1,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Event-Based Motion Deblurring With Blur-Aware Reconstruction Filter 带有模糊感知重建滤波器的基于事件的运动去模糊
IF 11.1 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-14 DOI: 10.1109/TCSVT.2025.3551516
Nuo Chen;Chushu Zhang;Wei An;Longguang Wang;Miao Li;Qiang Ling
{"title":"Event-Based Motion Deblurring With Blur-Aware Reconstruction Filter","authors":"Nuo Chen;Chushu Zhang;Wei An;Longguang Wang;Miao Li;Qiang Ling","doi":"10.1109/TCSVT.2025.3551516","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3551516","url":null,"abstract":"Event-based motion deblurring aims at reconstructing a sharp image from a single blurry image and its corresponding events triggered during the exposure time. Existing methods learn the spatial distribution of blur from blurred images, then treat events as temporal residuals and learn blurred temporal features from them, and finally restore clear images through spatio-temporal interaction of the two features. However, due to the high coupling of detailed features such as the texture and contour of the scene with blur features, it is difficult to directly learn effective blur spatial distribution from the original blurred image. In this paper, we provide a novel perspective, i.e., employing the blur indication provided by events, to instruct the network in spatially differentiated image reconstruction. Due to the consistency between event spatial distribution and image blur, event spatial indication can learn blur spatial features more simply and directly, and serve as a complement to temporal residual guidance to improve deblurring performance. Based on the above insight, we propose an event-based motion deblurring network consisting of a Multi-Scale Event-based Double Integral (MS-EDI) module designed from temporal residual guidance, and a Blur-Aware Filter Prediction (BAFP) module to conduct filter processing directed by spatial blur indication. The network, after incorporating spatial residual guidance, has significantly enhanced its generalization ability, surpassing the best-performing image-based and event-based methods on both synthetic, semi-synthetic, and real-world datasets. In addition, our method can be extended to blurry image super-resolution and achieves impressive performance. Our code is available at: <uri>https://github.com/ChenYichen9527/MBNet</uri> now.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8508-8519"},"PeriodicalIF":11.1,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generic Objects as Pose Probes for Few-Shot View Synthesis 一般对象作为姿态探针的少镜头视图合成
IF 11.1 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-14 DOI: 10.1109/TCSVT.2025.3551303
Zhirui Gao;Renjiao Yi;Chenyang Zhu;Ke Zhuang;Wei Chen;Kai Xu
{"title":"Generic Objects as Pose Probes for Few-Shot View Synthesis","authors":"Zhirui Gao;Renjiao Yi;Chenyang Zhu;Ke Zhuang;Wei Chen;Kai Xu","doi":"10.1109/TCSVT.2025.3551303","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3551303","url":null,"abstract":"Radiance fields, including NeRFs and 3D Gaussians, demonstrate great potential in high-fidelity rendering and scene reconstruction, while they require a substantial number of posed images as input. COLMAP is frequently employed for preprocessing to estimate poses. However, COLMAP necessitates a large number of feature matches to operate effectively, and struggles with scenes characterized by sparse features, large baselines, or few-view images. We aim to tackle few-view NeRF reconstruction using only 3 to 6 unposed scene images, freeing from COLMAP initializations. Inspired by the idea of calibration boards in traditional pose calibration, we propose a novel approach of utilizing everyday objects, commonly found in both images and real life, as “pose probes”. By initializing the probe object as a cube shape, we apply a dual-branch volume rendering optimization (object NeRF and scene NeRF) to constrain the pose optimization and jointly refine the geometry. PnP matching is used to initialize poses between images incrementally, where only a few feature matches are enough. PoseProbe achieves state-of-the-art performance in pose estimation and novel view synthesis across multiple datasets in experiments. We demonstrate its effectiveness, particularly in few-view and large-baseline scenes where COLMAP struggles. In ablations, using different objects in a scene yields comparable performance, showing that PoseProbe is robust to the choice of probe objects. Our project page is available at: <uri>https://zhirui-gao.github.io/PoseProbe.github.io/</uri>","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9046-9059"},"PeriodicalIF":11.1,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信