校正和上下文感知极化提示鲁棒深度增强

IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Zhenyu Liu , Jiatong Xu , Daxin Liu , Qide Wang , Jin Cheng , Jianrong Tan
{"title":"校正和上下文感知极化提示鲁棒深度增强","authors":"Zhenyu Liu ,&nbsp;Jiatong Xu ,&nbsp;Daxin Liu ,&nbsp;Qide Wang ,&nbsp;Jin Cheng ,&nbsp;Jianrong Tan","doi":"10.1016/j.knosys.2025.114498","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate depth perception is fundamental for numerous computer vision applications, yet depth maps acquired from commodity sensors often suffer from artifacts and inaccuracies, necessitating effective enhancement techniques. Polarization imaging, capturing rich geometric cues robust to illumination variations, offers a promising modality to guide this process. However, effectively integrating these cues within learning-based depth enhancement frameworks remains challenging. Existing methods often overlook the inherent representational gap between depth and polarization features and employ context-agnostic fusion mechanisms, incapable of generating prompts adaptive to cross-modal relationships and local context. To address these limitations, we propose a novel Rectified and Context-Aware Polarization Prompting (ReCAP<span><math><msup><mrow></mrow><mn>2</mn></msup></math></span>) framework for depth enhancement models. The ReCAP<span><math><msup><mrow></mrow><mn>2</mn></msup></math></span> first performs initial feature rectification across both channel and spatial dimensions to bridge the modality gap. Subsequently, it generates fine-grained polarization prompts by leveraging dual-level context: utilizing cross-modal context ensures the prompts encode pertinent inter-modality relationships, while processing spatial neighborhood context yields prompts spatially tailored to regional content. Consequently, these dual-context aware prompts provide precise, adaptive guidance for the foundation model, facilitating more robust depth enhancement. Extensive experiments demonstrate the effectiveness of our method. On the multi-modal HAMMER dataset, our method shows superior accuracy and robustness across diverse sensor types in indoor scenes under both full fine-tuning and prompt tuning settings. Furthermore, cross-domain evaluations on the challenging CroMo dataset validate its strong generalization to outdoor environments.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114498"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ReCAP2: Rectified and context-aware polarization prompting for robust depth enhancement\",\"authors\":\"Zhenyu Liu ,&nbsp;Jiatong Xu ,&nbsp;Daxin Liu ,&nbsp;Qide Wang ,&nbsp;Jin Cheng ,&nbsp;Jianrong Tan\",\"doi\":\"10.1016/j.knosys.2025.114498\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurate depth perception is fundamental for numerous computer vision applications, yet depth maps acquired from commodity sensors often suffer from artifacts and inaccuracies, necessitating effective enhancement techniques. Polarization imaging, capturing rich geometric cues robust to illumination variations, offers a promising modality to guide this process. However, effectively integrating these cues within learning-based depth enhancement frameworks remains challenging. Existing methods often overlook the inherent representational gap between depth and polarization features and employ context-agnostic fusion mechanisms, incapable of generating prompts adaptive to cross-modal relationships and local context. To address these limitations, we propose a novel Rectified and Context-Aware Polarization Prompting (ReCAP<span><math><msup><mrow></mrow><mn>2</mn></msup></math></span>) framework for depth enhancement models. The ReCAP<span><math><msup><mrow></mrow><mn>2</mn></msup></math></span> first performs initial feature rectification across both channel and spatial dimensions to bridge the modality gap. Subsequently, it generates fine-grained polarization prompts by leveraging dual-level context: utilizing cross-modal context ensures the prompts encode pertinent inter-modality relationships, while processing spatial neighborhood context yields prompts spatially tailored to regional content. Consequently, these dual-context aware prompts provide precise, adaptive guidance for the foundation model, facilitating more robust depth enhancement. Extensive experiments demonstrate the effectiveness of our method. On the multi-modal HAMMER dataset, our method shows superior accuracy and robustness across diverse sensor types in indoor scenes under both full fine-tuning and prompt tuning settings. Furthermore, cross-domain evaluations on the challenging CroMo dataset validate its strong generalization to outdoor environments.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"330 \",\"pages\":\"Article 114498\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705125015370\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125015370","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

准确的深度感知是许多计算机视觉应用的基础,然而从商品传感器获得的深度图经常受到伪影和不准确性的影响,需要有效的增强技术。偏振成像,捕捉丰富的几何线索健壮的照明变化,提供了一个有前途的模式来指导这一过程。然而,在基于学习的深度增强框架中有效地整合这些线索仍然具有挑战性。现有的方法往往忽略了深度和极化特征之间固有的表征差距,采用了与上下文无关的融合机制,无法生成适应跨模态关系和局部上下文的提示。为了解决这些限制,我们提出了一种新的纠偏和上下文感知极化提示(ReCAP2)框架,用于深度增强模型。ReCAP2首先在通道和空间维度上执行初始特征校正,以弥合模态差距。随后,它通过利用双层上下文生成细粒度的极化提示:利用跨模态上下文确保提示编码相关的模态间关系,而处理空间邻近上下文则产生适合区域内容的空间提示。因此,这些双上下文感知提示为基础模型提供了精确的、自适应的指导,促进了更强大的深度增强。大量的实验证明了该方法的有效性。在多模态HAMMER数据集上,我们的方法在室内场景中,在完全微调和提示微调设置下,在不同传感器类型上都显示出卓越的准确性和鲁棒性。此外,对具有挑战性的CroMo数据集的跨域评估验证了其对户外环境的强泛化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
ReCAP2: Rectified and context-aware polarization prompting for robust depth enhancement
Accurate depth perception is fundamental for numerous computer vision applications, yet depth maps acquired from commodity sensors often suffer from artifacts and inaccuracies, necessitating effective enhancement techniques. Polarization imaging, capturing rich geometric cues robust to illumination variations, offers a promising modality to guide this process. However, effectively integrating these cues within learning-based depth enhancement frameworks remains challenging. Existing methods often overlook the inherent representational gap between depth and polarization features and employ context-agnostic fusion mechanisms, incapable of generating prompts adaptive to cross-modal relationships and local context. To address these limitations, we propose a novel Rectified and Context-Aware Polarization Prompting (ReCAP2) framework for depth enhancement models. The ReCAP2 first performs initial feature rectification across both channel and spatial dimensions to bridge the modality gap. Subsequently, it generates fine-grained polarization prompts by leveraging dual-level context: utilizing cross-modal context ensures the prompts encode pertinent inter-modality relationships, while processing spatial neighborhood context yields prompts spatially tailored to regional content. Consequently, these dual-context aware prompts provide precise, adaptive guidance for the foundation model, facilitating more robust depth enhancement. Extensive experiments demonstrate the effectiveness of our method. On the multi-modal HAMMER dataset, our method shows superior accuracy and robustness across diverse sensor types in indoor scenes under both full fine-tuning and prompt tuning settings. Furthermore, cross-domain evaluations on the challenging CroMo dataset validate its strong generalization to outdoor environments.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Knowledge-Based Systems
Knowledge-Based Systems 工程技术-计算机:人工智能
CiteScore
14.80
自引率
12.50%
发文量
1245
审稿时长
7.8 months
期刊介绍: Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信