Zhenyu Liu , Jiatong Xu , Daxin Liu , Qide Wang , Jin Cheng , Jianrong Tan
{"title":"ReCAP2: Rectified and context-aware polarization prompting for robust depth enhancement","authors":"Zhenyu Liu , Jiatong Xu , Daxin Liu , Qide Wang , Jin Cheng , Jianrong Tan","doi":"10.1016/j.knosys.2025.114498","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate depth perception is fundamental for numerous computer vision applications, yet depth maps acquired from commodity sensors often suffer from artifacts and inaccuracies, necessitating effective enhancement techniques. Polarization imaging, capturing rich geometric cues robust to illumination variations, offers a promising modality to guide this process. However, effectively integrating these cues within learning-based depth enhancement frameworks remains challenging. Existing methods often overlook the inherent representational gap between depth and polarization features and employ context-agnostic fusion mechanisms, incapable of generating prompts adaptive to cross-modal relationships and local context. To address these limitations, we propose a novel Rectified and Context-Aware Polarization Prompting (ReCAP<span><math><msup><mrow></mrow><mn>2</mn></msup></math></span>) framework for depth enhancement models. The ReCAP<span><math><msup><mrow></mrow><mn>2</mn></msup></math></span> first performs initial feature rectification across both channel and spatial dimensions to bridge the modality gap. Subsequently, it generates fine-grained polarization prompts by leveraging dual-level context: utilizing cross-modal context ensures the prompts encode pertinent inter-modality relationships, while processing spatial neighborhood context yields prompts spatially tailored to regional content. Consequently, these dual-context aware prompts provide precise, adaptive guidance for the foundation model, facilitating more robust depth enhancement. Extensive experiments demonstrate the effectiveness of our method. On the multi-modal HAMMER dataset, our method shows superior accuracy and robustness across diverse sensor types in indoor scenes under both full fine-tuning and prompt tuning settings. Furthermore, cross-domain evaluations on the challenging CroMo dataset validate its strong generalization to outdoor environments.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114498"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125015370","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate depth perception is fundamental for numerous computer vision applications, yet depth maps acquired from commodity sensors often suffer from artifacts and inaccuracies, necessitating effective enhancement techniques. Polarization imaging, capturing rich geometric cues robust to illumination variations, offers a promising modality to guide this process. However, effectively integrating these cues within learning-based depth enhancement frameworks remains challenging. Existing methods often overlook the inherent representational gap between depth and polarization features and employ context-agnostic fusion mechanisms, incapable of generating prompts adaptive to cross-modal relationships and local context. To address these limitations, we propose a novel Rectified and Context-Aware Polarization Prompting (ReCAP) framework for depth enhancement models. The ReCAP first performs initial feature rectification across both channel and spatial dimensions to bridge the modality gap. Subsequently, it generates fine-grained polarization prompts by leveraging dual-level context: utilizing cross-modal context ensures the prompts encode pertinent inter-modality relationships, while processing spatial neighborhood context yields prompts spatially tailored to regional content. Consequently, these dual-context aware prompts provide precise, adaptive guidance for the foundation model, facilitating more robust depth enhancement. Extensive experiments demonstrate the effectiveness of our method. On the multi-modal HAMMER dataset, our method shows superior accuracy and robustness across diverse sensor types in indoor scenes under both full fine-tuning and prompt tuning settings. Furthermore, cross-domain evaluations on the challenging CroMo dataset validate its strong generalization to outdoor environments.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.