校正和上下文感知极化提示鲁棒深度增强

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems Pub Date : 2025-09-20 DOI:10.1016/j.knosys.2025.114498

Zhenyu Liu , Jiatong Xu , Daxin Liu , Qide Wang , Jin Cheng , Jianrong Tan

{"title":"校正和上下文感知极化提示鲁棒深度增强","authors":"Zhenyu Liu , Jiatong Xu , Daxin Liu , Qide Wang , Jin Cheng , Jianrong Tan","doi":"10.1016/j.knosys.2025.114498","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate depth perception is fundamental for numerous computer vision applications, yet depth maps acquired from commodity sensors often suffer from artifacts and inaccuracies, necessitating effective enhancement techniques. Polarization imaging, capturing rich geometric cues robust to illumination variations, offers a promising modality to guide this process. However, effectively integrating these cues within learning-based depth enhancement frameworks remains challenging. Existing methods often overlook the inherent representational gap between depth and polarization features and employ context-agnostic fusion mechanisms, incapable of generating prompts adaptive to cross-modal relationships and local context. To address these limitations, we propose a novel Rectified and Context-Aware Polarization Prompting (ReCAP<span><math><msup><mrow></mrow><mn>2</mn></msup></math></span>) framework for depth enhancement models. The ReCAP<span><math><msup><mrow></mrow><mn>2</mn></msup></math></span> first performs initial feature rectification across both channel and spatial dimensions to bridge the modality gap. Subsequently, it generates fine-grained polarization prompts by leveraging dual-level context: utilizing cross-modal context ensures the prompts encode pertinent inter-modality relationships, while processing spatial neighborhood context yields prompts spatially tailored to regional content. Consequently, these dual-context aware prompts provide precise, adaptive guidance for the foundation model, facilitating more robust depth enhancement. Extensive experiments demonstrate the effectiveness of our method. On the multi-modal HAMMER dataset, our method shows superior accuracy and robustness across diverse sensor types in indoor scenes under both full fine-tuning and prompt tuning settings. Furthermore, cross-domain evaluations on the challenging CroMo dataset validate its strong generalization to outdoor environments.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114498"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ReCAP2: Rectified and context-aware polarization prompting for robust depth enhancement\",\"authors\":\"Zhenyu Liu , Jiatong Xu , Daxin Liu , Qide Wang , Jin Cheng , Jianrong Tan\",\"doi\":\"10.1016/j.knosys.2025.114498\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurate depth perception is fundamental for numerous computer vision applications, yet depth maps acquired from commodity sensors often suffer from artifacts and inaccuracies, necessitating effective enhancement techniques. Polarization imaging, capturing rich geometric cues robust to illumination variations, offers a promising modality to guide this process. However, effectively integrating these cues within learning-based depth enhancement frameworks remains challenging. Existing methods often overlook the inherent representational gap between depth and polarization features and employ context-agnostic fusion mechanisms, incapable of generating prompts adaptive to cross-modal relationships and local context. To address these limitations, we propose a novel Rectified and Context-Aware Polarization Prompting (ReCAP<span><math><msup><mrow></mrow><mn>2</mn></msup></math></span>) framework for depth enhancement models. The ReCAP<span><math><msup><mrow></mrow><mn>2</mn></msup></math></span> first performs initial feature rectification across both channel and spatial dimensions to bridge the modality gap. Subsequently, it generates fine-grained polarization prompts by leveraging dual-level context: utilizing cross-modal context ensures the prompts encode pertinent inter-modality relationships, while processing spatial neighborhood context yields prompts spatially tailored to regional content. Consequently, these dual-context aware prompts provide precise, adaptive guidance for the foundation model, facilitating more robust depth enhancement. Extensive experiments demonstrate the effectiveness of our method. On the multi-modal HAMMER dataset, our method shows superior accuracy and robustness across diverse sensor types in indoor scenes under both full fine-tuning and prompt tuning settings. Furthermore, cross-domain evaluations on the challenging CroMo dataset validate its strong generalization to outdoor environments.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"330 \",\"pages\":\"Article 114498\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705125015370\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125015370","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

准确的深度感知是许多计算机视觉应用的基础，然而从商品传感器获得的深度图经常受到伪影和不准确性的影响，需要有效的增强技术。偏振成像，捕捉丰富的几何线索健壮的照明变化，提供了一个有前途的模式来指导这一过程。然而，在基于学习的深度增强框架中有效地整合这些线索仍然具有挑战性。现有的方法往往忽略了深度和极化特征之间固有的表征差距，采用了与上下文无关的融合机制，无法生成适应跨模态关系和局部上下文的提示。为了解决这些限制，我们提出了一种新的纠偏和上下文感知极化提示（ReCAP2）框架，用于深度增强模型。ReCAP2首先在通道和空间维度上执行初始特征校正，以弥合模态差距。随后，它通过利用双层上下文生成细粒度的极化提示：利用跨模态上下文确保提示编码相关的模态间关系，而处理空间邻近上下文则产生适合区域内容的空间提示。因此，这些双上下文感知提示为基础模型提供了精确的、自适应的指导，促进了更强大的深度增强。大量的实验证明了该方法的有效性。在多模态HAMMER数据集上，我们的方法在室内场景中，在完全微调和提示微调设置下，在不同传感器类型上都显示出卓越的准确性和鲁棒性。此外，对具有挑战性的CroMo数据集的跨域评估验证了其对户外环境的强泛化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ReCAP2: Rectified and context-aware polarization prompting for robust depth enhancement

Accurate depth perception is fundamental for numerous computer vision applications, yet depth maps acquired from commodity sensors often suffer from artifacts and inaccuracies, necessitating effective enhancement techniques. Polarization imaging, capturing rich geometric cues robust to illumination variations, offers a promising modality to guide this process. However, effectively integrating these cues within learning-based depth enhancement frameworks remains challenging. Existing methods often overlook the inherent representational gap between depth and polarization features and employ context-agnostic fusion mechanisms, incapable of generating prompts adaptive to cross-modal relationships and local context. To address these limitations, we propose a novel Rectified and Context-Aware Polarization Prompting (ReCAP

^{2}

) framework for depth enhancement models. The ReCAP

^{2}

first performs initial feature rectification across both channel and spatial dimensions to bridge the modality gap. Subsequently, it generates fine-grained polarization prompts by leveraging dual-level context: utilizing cross-modal context ensures the prompts encode pertinent inter-modality relationships, while processing spatial neighborhood context yields prompts spatially tailored to regional content. Consequently, these dual-context aware prompts provide precise, adaptive guidance for the foundation model, facilitating more robust depth enhancement. Extensive experiments demonstrate the effectiveness of our method. On the multi-modal HAMMER dataset, our method shows superior accuracy and robustness across diverse sensor types in indoor scenes under both full fine-tuning and prompt tuning settings. Furthermore, cross-domain evaluations on the challenging CroMo dataset validate its strong generalization to outdoor environments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.