IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献

筛选
英文 中文
Cross-Domain Diffusion With Progressive Alignment for Efficient Adaptive Retrieval 基于渐进式对齐的跨域扩散高效自适应检索
Junyu Luo;Yusheng Zhao;Xiao Luo;Zhiping Xiao;Wei Ju;Li Shen;Dacheng Tao;Ming Zhang
{"title":"Cross-Domain Diffusion With Progressive Alignment for Efficient Adaptive Retrieval","authors":"Junyu Luo;Yusheng Zhao;Xiao Luo;Zhiping Xiao;Wei Ju;Li Shen;Dacheng Tao;Ming Zhang","doi":"10.1109/TIP.2025.3547678","DOIUrl":"10.1109/TIP.2025.3547678","url":null,"abstract":"Unsupervised efficient domain adaptive retrieval aims to transfer knowledge from a labeled source domain to an unlabeled target domain, while maintaining low storage cost and high retrieval efficiency. However, existing methods typically fail to address potential noise in the target domain, and directly align high-level features across domains, thus resulting in suboptimal retrieval performance. To address these challenges, we propose a novel Cross-Domain Diffusion with Progressive Alignment method (COUPLE). This approach revisits unsupervised efficient domain adaptive retrieval from a graph diffusion perspective, simulating cross-domain adaptation dynamics to achieve a stable target domain adaptation process. First, we construct a cross-domain relationship graph and leverage noise-robust graph flow diffusion to simulate the transfer dynamics from the source domain to the target domain, identifying lower noise clusters. We then leverage the graph diffusion results for discriminative hash code learning, effectively learning from the target domain while reducing the negative impact of noise. Furthermore, we employ a hierarchical Mixup operation for progressive domain alignment, which is performed along the cross-domain random walk paths. Utilizing target domain discriminative hash learning and progressive domain alignment, COUPLE enables effective domain adaptive hash learning. Extensive experiments demonstrate COUPLE’s effectiveness on competitive benchmarks.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1820-1834"},"PeriodicalIF":0.0,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143599233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LangLoc: Language-Driven Localization via Formatted Spatial Description Generation 通过格式化空间描述生成的语言驱动定位
Weimin Shi;Changhao Chen;Kaige Li;Yuan Xiong;Xiaochun Cao;Zhong Zhou
{"title":"LangLoc: Language-Driven Localization via Formatted Spatial Description Generation","authors":"Weimin Shi;Changhao Chen;Kaige Li;Yuan Xiong;Xiaochun Cao;Zhong Zhou","doi":"10.1109/TIP.2025.3546853","DOIUrl":"10.1109/TIP.2025.3546853","url":null,"abstract":"Existing localization methods commonly employ vision to perceive scene and achieve localization in GNSS-denied areas, yet they often struggle in environments with complex lighting conditions, dynamic objects or privacy-preserving areas. Humans possess the ability to describe various scenes using natural language, effectively inferring their location by leveraging the rich semantic information in these descriptions. Harnessing language presents a potential solution for robust localization. Thus, this study introduces a new task, Language-driven Localization, and proposes a novel localization framework, LangLoc, which determines the user’s position and orientation through textual descriptions. Given the diversity of natural language descriptions, we first design a Spatial Description Generator (SDG), foundational to LangLoc, which extracts and combines the position and attribute information of objects within a scene to generate uniformly formatted textual descriptions. SDG eliminates the ambiguity of language, detailing the spatial layout and object relations of the scene, providing a reliable basis for localization. With generated descriptions, LangLoc effortlessly achieves language-only localization using text encoder and pose regressor. Furthermore, LangLoc can add one image to text input, achieving mutual optimization and feature adaptive fusion across modalities through two modality-specific encoders, cross-modal fusion, and multimodal joint learning strategies. This enhances the framework’s capability to handle complex scenes, achieving more accurate localization. Extensive experiments on the Oxford RobotCar, 4-Seasons, and Virtual Gallery datasets demonstrate LangLoc’s effectiveness in both language-only and visual-language localization across various outdoor and indoor scenarios. Notably, LangLoc achieves noticeable performance gains when using both text and image inputs in challenging conditions such as overexposure, low lighting, and occlusions, showcasing its superior robustness.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1737-1752"},"PeriodicalIF":0.0,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143599232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Adaptive Multi-Granularity Graph Representation of Image via Granular-ball Computing 基于颗粒球计算的图像自适应多粒度图表示
Dawei Dai;Fan Chen;Shuyin Xia;Long Yang;Guan Wang;Guoyin Wang;Xinbo Gao
{"title":"An Adaptive Multi-Granularity Graph Representation of Image via Granular-ball Computing","authors":"Dawei Dai;Fan Chen;Shuyin Xia;Long Yang;Guan Wang;Guoyin Wang;Xinbo Gao","doi":"10.1109/TIP.2025.3565212","DOIUrl":"10.1109/TIP.2025.3565212","url":null,"abstract":"Graph neural networks (GNNs) encounter challenges in establishing deep structures and managing a large number of parameters effectively to learn node features comprehensively. Consequently, in vision tasks, GNNs often struggle to achieve high classification accuracy compared to convolutional neural networks. Nonetheless, GNNs retain crucial advantages and potential, particularly in lightweight network scale and efficient, reliable decision-making. Thus, improving GNN performance in vision tasks remains a significant research endeavor, with numerous important works exploring the application of GNN models in such contexts, where the graph representation of images poses a key challenge. Existing methods often fall short in adaptively generating blocks of different sizes and their corresponding edges to form graph representations according to graph semantics. To address this issue, we propose a novel method to convert images into graphical forms using granular-ball computing. Our approach does not rely on manual annotation or other learning methods, yet it can dynamically generate block nodes of varying sizes and corresponding edges. Compared to other state-of-the-art methods, our approach better captures semantic information within the graph. Despite having fewer parameters, our method significantly enhances accuracy. Overall, our work holds substantial implications for improving the performance of graph neural networks in vision tasks.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2986-2999"},"PeriodicalIF":0.0,"publicationDate":"2025-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143930596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Axis Feature Diversity Enhancement for Remote Sensing Video Super-Resolution 遥感视频超分辨率多轴特征多样性增强
Yi Xiao;Qiangqiang Yuan;Kui Jiang;Yuzeng Chen;Shiqi Wang;Chia-Wen Lin
{"title":"Multi-Axis Feature Diversity Enhancement for Remote Sensing Video Super-Resolution","authors":"Yi Xiao;Qiangqiang Yuan;Kui Jiang;Yuzeng Chen;Shiqi Wang;Chia-Wen Lin","doi":"10.1109/TIP.2025.3547298","DOIUrl":"10.1109/TIP.2025.3547298","url":null,"abstract":"How to aggregate spatial-temporal information plays an essential role in video super-resolution (VSR) tasks. Despite the remarkable success, existing methods adopt static convolution to encode spatial-temporal information, which lacks flexibility in aggregating information in large-scale remote sensing scenes, as they often contain heterogeneous features (e.g., diverse textures). In this paper, we propose a spatial feature diversity enhancement module (SDE) and channel diversity enhancement module (CDE), which explore the diverse representation of different local patterns while aggregating the global response with compactly channel-wise embedding representation. Specifically, SDE introduces multiple learnable filters to extract representative spatial variants and encodes them to generate a dynamic kernel for enriched spatial representation. To explore the diversity in the channel dimension, CDE exploits the discrete cosine transform to transform the feature into the frequency domain. This enriches the channel representation while mitigating massive frequency loss caused by pooling operation. Based on SDE and CDE, we further devise a multi-axis feature diversity enhancement (MADE) module to harmonize the spatial, channel, and pixel-wise features for diverse feature fusion. These elaborate strategies form a novel network for satellite VSR, termed MADNet, which achieves favorable performance against state-of-the-art method BasicVSR++ in terms of average PSNR by 0.14 dB on various video satellites, including JiLin-1, Carbonite-2, SkySat-1, and UrtheCast. Code will be available at <uri>https://github.com/XY-boy/MADNet</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1766-1778"},"PeriodicalIF":0.0,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143575370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Nonnegative Tensor Completion With Automatic Rank Determination 基于自动秩确定的贝叶斯非负张量补全
Zecan Yang;Laurence T. Yang;Huaimin Wang;Honglu Zhao;Debin Liu
{"title":"Bayesian Nonnegative Tensor Completion With Automatic Rank Determination","authors":"Zecan Yang;Laurence T. Yang;Huaimin Wang;Honglu Zhao;Debin Liu","doi":"10.1109/TIP.2024.3459647","DOIUrl":"10.1109/TIP.2024.3459647","url":null,"abstract":"Nonnegative CANDECOMP/PARAFAC (CP) factorization of incomplete tensors is a powerful technique for finding meaningful and physically interpretable latent factor matrices to achieve nonnegative tensor completion. However, most existing nonnegative CP models rely on manually predefined tensor ranks, which introduces uncertainty and leads the models to overfit or underfit. Although the presence of CP models within the probabilistic framework can estimate rank better, they lack the ability to learn nonnegative factors from incomplete data. In addition, existing approaches tend to focus on point estimation and ignore estimating uncertainty. To address these issues within a unified framework, we propose a fully Bayesian treatment of nonnegative tensor completion with automatic rank determination. Benefitting from the Bayesian framework and the hierarchical sparsity-inducing priors, the model can provide uncertainty estimates of nonnegative latent factors and effectively obtain low-rank structures from incomplete tensors. Additionally, the proposed model can mitigate problems of parameter selection and overfitting. For model learning, we develop two fully Bayesian inference methods for posterior estimation and propose a hybrid computing strategy that reduces the time overhead for large-scale data significantly. Extensive simulations on synthetic data demonstrate that our model can recover missing data with high precision and automatically estimate CP rank from incomplete tensors. Moreover, results from real-world applications demonstrate that our model is superior to state-of-the-art methods in image and video inpainting. The code is available at <uri>https://github.com/zecanyang/BNTC</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2036-2051"},"PeriodicalIF":0.0,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143575377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explicit Motion Handling and Interactive Prompting for Video Camouflaged Object Detection 视频伪装对象检测的显式运动处理和交互式提示
Xin Zhang;Tao Xiao;Ge-Peng Ji;Xuan Wu;Keren Fu;Qijun Zhao
{"title":"Explicit Motion Handling and Interactive Prompting for Video Camouflaged Object Detection","authors":"Xin Zhang;Tao Xiao;Ge-Peng Ji;Xuan Wu;Keren Fu;Qijun Zhao","doi":"10.1109/TIP.2025.3565879","DOIUrl":"10.1109/TIP.2025.3565879","url":null,"abstract":"Camouflage poses notable challenges in distinguishing a static target, as it usually blends seamlessly with the background. However, any movement by the target can disrupt this disguise, making it detectable. Existing video camouflaged object detection (VCOD) approaches take noisy motion estimation as input or model motion implicitly, restricting detection performance in complex dynamic scenes. In this paper, we propose a novel Explicit Motion handling and Interactive Prompting framework for VCOD, dubbed EMIP, which handles motion cues explicitly using a frozen pre-trained optical flow fundamental model. EMIP is characterized by a two-stream architecture for simultaneously conducting camouflaged segmentation and optical flow estimation. Interactions across the dual streams are realized in an interactive prompting way that is inspired by emerging visual prompt learning. Two learnable modules, i.e. the camouflaged feeder and motion collector, are designed to incorporate segmentation-to-motion and motion-to-segmentation prompts, respectively, and enhance outputs of the both streams. The prompt fed to the motion stream is learned by supervising optical flow in a self-supervised manner. Furthermore, we show that long-term historical information can also be incorporated as a prompt into EMIP and achieve more robust results with temporal consistency. By leveraging promoting techniques based on EMIP, the proposed long-term model EMIP<inline-formula> <tex-math>${}^{dagger }$ </tex-math></inline-formula> incurs lower training cost with only 8.5M trainable parameters (less than 8% of the total model parameters). Experimental results demonstrate that both EMIP and EMIP<inline-formula> <tex-math>${}^{dagger }$ </tex-math></inline-formula> set new state-of-the-art records on popular VCOD benchmarks. Additionally, comparative evaluations against other video segmentation models on a wider range of video segmentation tasks demonstrate the robustness and superior generalization capabilities of EMIP. Our code is made publicly available at <uri>https://github.com/zhangxin06/EMIP</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2853-2866"},"PeriodicalIF":0.0,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143915722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Iris Geometric Transformation Guided Deep Appearance-Based Gaze Estimation 基于虹膜几何变换的深度外观注视估计
Wei Nie;Zhiyong Wang;Weihong Ren;Hanlin Zhang;Honghai Liu
{"title":"Iris Geometric Transformation Guided Deep Appearance-Based Gaze Estimation","authors":"Wei Nie;Zhiyong Wang;Weihong Ren;Hanlin Zhang;Honghai Liu","doi":"10.1109/TIP.2025.3546465","DOIUrl":"10.1109/TIP.2025.3546465","url":null,"abstract":"The geometric alterations in the iris’s appearance are intricately linked to the gaze direction. However, current deep appearance-based gaze estimation methods mainly rely on latent feature sharing to leverage iris features for improving deep representation learning, often neglecting the explicit modeling of their geometric relationships. To address this issue, this paper revisits the physiological structure of the eyeball and introduces a set of geometric assumptions, such as “the normal vector of the iris center approximates the gaze direction”. Building on these assumptions, we propose an Iris Geometric Transformation Guided Gaze estimation (IGTG-Gaze) module, which establishes an explicit geometric parameter sharing mechanism to link gaze direction and sparse iris landmark coordinates directly. Extensive experimental results demonstrate that IGTG-Gaze seamlessly integrates into various deep neural networks, flexibly extends from sparse iris landmarks to dense eye mesh, and consistently achieves leading performance in both within- and cross-dataset evaluations, all while maintaining end-to-end optimization. These advantages highlight IGTG-Gaze as a practical and effective approach for enhancing deep gaze representation from appearance.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1616-1631"},"PeriodicalIF":0.0,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancing Real-World Stereoscopic Image Super-Resolution via Vision-Language Model 通过视觉语言模型推进现实世界立体图像超分辨率
Zhe Zhang;Jianjun Lei;Bo Peng;Jie Zhu;Liying Xu;Qingming Huang
{"title":"Advancing Real-World Stereoscopic Image Super-Resolution via Vision-Language Model","authors":"Zhe Zhang;Jianjun Lei;Bo Peng;Jie Zhu;Liying Xu;Qingming Huang","doi":"10.1109/TIP.2025.3546470","DOIUrl":"10.1109/TIP.2025.3546470","url":null,"abstract":"Recent years have witnessed the remarkable success of the vision-language model in various computer vision tasks. However, how to exploit the semantic language knowledge of the vision-language model to advance real-world stereoscopic image super-resolution remains a challenging problem. This paper proposes a vision-language model-based stereoscopic image super-resolution (VLM-SSR) method, in which the semantic language knowledge in CLIP is exploited to facilitate stereoscopic image SR in a training-free manner. Specifically, by designing visual prompts for CLIP to infer the region similarity, a prompt-guided information aggregation mechanism is presented to capture inter-view information among relevant regions between the left and right views. Besides, driven by the prior knowledge of CLIP, a cognition prior-driven iterative enhancing mechanism is presented to optimize fuzzy regions adaptively. Experimental results on four datasets verify the effectiveness of the proposed method.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2187-2197"},"PeriodicalIF":0.0,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143569813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advances in Predictive RAHT for Geometric Point Cloud Compression 几何点云压缩预测RAHT研究进展。
Bharath Vishwanath;Kai Zhang;Li Zhang
{"title":"Advances in Predictive RAHT for Geometric Point Cloud Compression","authors":"Bharath Vishwanath;Kai Zhang;Li Zhang","doi":"10.1109/TIP.2025.3565992","DOIUrl":"10.1109/TIP.2025.3565992","url":null,"abstract":"Point cloud compression is critical for the success of immersive multimedia applications. For attribute compression in geometric point cloud compression (G-PCC), Region Adaptive Hierarchical Transform (RAHT) is the preferred coding method. This paper presents several advances to predictive coding with RAHT: 1) Sample Domain Prediction: Prediction in RAHT is done in transform domain. This introduces undesirable distortion to the prediction signal because of fixed-point computations and leads to increased decoding complexity. We address this by naturally applying prediction in sample domain. The method opens door to skip the transform stage altogether when all residues are quantized to zero, leading to a significantly light decoder. 2) Reference Node Resampling: Inter-prediction signal derived in RAHT could have a different occupancy and weight distribution compared to the current block, causing a mismatch. To address this, we resample the reference node and align the occupancy and weight distribution. 3) Temporal Filtering: During inter-prediction, the reference node is simply copied as the prediction signal. This assumes a correlation coefficient of unity, which is barely true. We introduce a temporal filtering mechanism conditioned on the sub-band, that emulates a low-pass filtering and achieves improved prediction. 4) Inter-Eligibility: During AC inter-prediction, both encoder and decoder have access to the DC of the current and the reference nodes. We use this information to derive an inter-eligibility criterion. Experimental results show considerable gains and reduced complexity that demonstrate the utility of the proposed methods. All the presented methods have been adopted to the second version of G-PCC.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2926-2938"},"PeriodicalIF":0.0,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143915245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MoVis: When 3D Object Detection Is Like Human Monocular Vision 当3D物体检测像人类的单目视觉
Zijie Wang;Jizheng Yi;Aibin Chen;Guangjie Han
{"title":"MoVis: When 3D Object Detection Is Like Human Monocular Vision","authors":"Zijie Wang;Jizheng Yi;Aibin Chen;Guangjie Han","doi":"10.1109/TIP.2025.3544880","DOIUrl":"10.1109/TIP.2025.3544880","url":null,"abstract":"Monocular 3D object detection has garnered significant attention for its outstanding cost effectiveness compared with multi-sensor systems. However, previous work mainly acquires object 3D properties in a heuristic way, with less emphasis on the cues between objects. Inspired by the mechanisms of monocular vision, we propose MoVis, an innovative 3D object detection framework that skillfully combines object hierarchy and color sequence cues. Specifically, a decoupled Spatial Relationship Encoder (SRE) is designed to effectively feed back the high-level encoding results with object hierarchical relationships to low-level features. This method not only effectively reduces the computational overhead of multi-scale coding, but also significantly improves the detection accuracy of occluded objects by incorporating the hierarchical relationship between objects into multi-scale features. Moreover, to obtain more precise object depth information, an Object-level Depth Modulator (ODM) based on the concept of conditional random fields is designed, which employs color sequences. Ultimately, the results of the SRE and ODM are efficiently fused by our Spatial Context Processor (SCP) to accurately perceive the 3D attributes of the objects. Extensive experiments on the KITTI and Rope3D benchmarks show that MoVis achieves state-of-the-art performance. Our MoVis represents a progressive approach that emulates how human monocular vision utilizes monocular cues to perceive 3D scenes.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3025-3040"},"PeriodicalIF":0.0,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143569726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信