ULDepth: Transform Self-Supervised Depth Estimation to Unpaired Multi-Domain Learning

IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of signal processing Pub Date : 2025-08-11 DOI:10.1109/OJSP.2025.3597873

Phan Thi Huyen Thanh;Trung Thai Tran;The Hiep Nguyen;Minh Huy Vu Nguyen;Tran Vu Pham;Truong Vinh Truong Duy;Duc Dung Nguyen

{"title":"ULDepth: Transform Self-Supervised Depth Estimation to Unpaired Multi-Domain Learning","authors":"Phan Thi Huyen Thanh;Trung Thai Tran;The Hiep Nguyen;Minh Huy Vu Nguyen;Tran Vu Pham;Truong Vinh Truong Duy;Duc Dung Nguyen","doi":"10.1109/OJSP.2025.3597873","DOIUrl":null,"url":null,"abstract":"This paper introduces a general plug-in framework designed to enhance the robustness and cross-domain generalization of self-supervised depth estimation models. Current models often struggle with real-world deployment due to their limited ability to generalize across diverse domains, such as varying lighting and weather conditions. Single-domain models are optimized for specific scenarios while existing multi-domain approaches typically rely on paired images, which are rarely available in real-world datasets. Our framework addresses these limitations by training directly on unpaired real images from multiple domains. Daytime images serve as a reference to guide the model in learning consistent depth distributions across these diverse domains through adversarial training, eliminating the need for paired images. To refine regions prone to artifacts, we augment the discriminator with positional encoding, which is combined with the predicted depth maps. We also incorporate a dynamic normalization mechanism to capture shared depth features across domains, removing the requirement for separate domain-specific encoders. Furthermore, we introduce a new benchmark designed for a more comprehensive evaluation, encompassing previously unaddressed real-world scenarios. By focusing on unpaired real data, our framework significantly improves the generalization capabilities of existing models, enabling them to better adapt to the complexities and authentic data encountered in real-world environments.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"1004-1016"},"PeriodicalIF":2.7000,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11122640","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of signal processing","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11122640/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

This paper introduces a general plug-in framework designed to enhance the robustness and cross-domain generalization of self-supervised depth estimation models. Current models often struggle with real-world deployment due to their limited ability to generalize across diverse domains, such as varying lighting and weather conditions. Single-domain models are optimized for specific scenarios while existing multi-domain approaches typically rely on paired images, which are rarely available in real-world datasets. Our framework addresses these limitations by training directly on unpaired real images from multiple domains. Daytime images serve as a reference to guide the model in learning consistent depth distributions across these diverse domains through adversarial training, eliminating the need for paired images. To refine regions prone to artifacts, we augment the discriminator with positional encoding, which is combined with the predicted depth maps. We also incorporate a dynamic normalization mechanism to capture shared depth features across domains, removing the requirement for separate domain-specific encoders. Furthermore, we introduce a new benchmark designed for a more comprehensive evaluation, encompassing previously unaddressed real-world scenarios. By focusing on unpaired real data, our framework significantly improves the generalization capabilities of existing models, enabling them to better adapt to the complexities and authentic data encountered in real-world environments.

查看原文本刊更多论文

ULDepth：将自监督深度估计转化为非配对多域学习

本文介绍了一个通用的插件框架，旨在提高自监督深度估计模型的鲁棒性和跨域泛化。当前的模型由于其在不同领域（如不同的照明和天气条件）的泛化能力有限，常常难以在现实世界中部署。单域模型针对特定场景进行了优化，而现有的多域方法通常依赖于配对图像，这在现实世界的数据集中很少可用。我们的框架通过直接训练来自多个域的未配对的真实图像来解决这些限制。白天的图像可以作为参考，指导模型通过对抗性训练学习这些不同领域的一致深度分布，从而消除对成对图像的需求。为了细化容易产生伪影的区域，我们使用位置编码增强鉴别器，该编码与预测的深度图相结合。我们还结合了一个动态规范化机制来捕获跨域的共享深度特征，从而消除了对单独的特定于域的编码器的需求。此外，我们引入了一个新的基准，用于更全面的评估，包括以前未解决的现实世界场景。通过关注未配对的真实数据，我们的框架显著提高了现有模型的泛化能力，使它们能够更好地适应现实环境中遇到的复杂性和真实数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊