为无监督室内深度估计自适应学习领域不变特征

IF 6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Multimedia Computing Communications and Applications Pub Date : 2024-06-13 DOI:10.1145/3672397

Jiehua Zhang, Liang Li, Chenggang Yan, Zhan Wang, Changliang Xu, Jiyong Zhang, Chuqiao Chen

{"title":"为无监督室内深度估计自适应学习领域不变特征","authors":"Jiehua Zhang, Liang Li, Chenggang Yan, Zhan Wang, Changliang Xu, Jiyong Zhang, Chuqiao Chen","doi":"10.1145/3672397","DOIUrl":null,"url":null,"abstract":"<p>Predicting depth maps from monocular images has made an impressive performance in the past years. However, most depth estimation methods are trained with paired image-depth map data or multi-view images (e.g., stereo pair and monocular sequence), which suffer from expensive annotation costs and poor transferability. Although unsupervised domain adaptation methods are introduced to mitigate the reliance on annotated data, rare works focus on the unsupervised cross-scenario indoor monocular depth estimation. In this paper, we propose to study the generalization of depth estimation models across different indoor scenarios in an adversarial-based domain adaptation paradigm. Concretely, a domain discriminator is designed for discriminating the representation from source and target domains, while the feature extractor aims to confuse the domain discriminator by capturing domain-invariant features. Further, we reconstruct depth maps from latent representations with the supervision of labeled source data. As a result, the feature extractor learned features possess the merit of both domain-invariant and low source risk, and the depth estimator can deal with the domain shift between source and target domains. We conduct the cross-scenario and cross-dataset experiments on the ScanNet and NYU-Depth-v2 datasets to verify the effectiveness of our method and achieve impressive performance.</p>","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"36 1","pages":""},"PeriodicalIF":6.0000,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning Domain Invariant Features for Unsupervised Indoor Depth Estimation Adaptation\",\"authors\":\"Jiehua Zhang, Liang Li, Chenggang Yan, Zhan Wang, Changliang Xu, Jiyong Zhang, Chuqiao Chen\",\"doi\":\"10.1145/3672397\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Predicting depth maps from monocular images has made an impressive performance in the past years. However, most depth estimation methods are trained with paired image-depth map data or multi-view images (e.g., stereo pair and monocular sequence), which suffer from expensive annotation costs and poor transferability. Although unsupervised domain adaptation methods are introduced to mitigate the reliance on annotated data, rare works focus on the unsupervised cross-scenario indoor monocular depth estimation. In this paper, we propose to study the generalization of depth estimation models across different indoor scenarios in an adversarial-based domain adaptation paradigm. Concretely, a domain discriminator is designed for discriminating the representation from source and target domains, while the feature extractor aims to confuse the domain discriminator by capturing domain-invariant features. Further, we reconstruct depth maps from latent representations with the supervision of labeled source data. As a result, the feature extractor learned features possess the merit of both domain-invariant and low source risk, and the depth estimator can deal with the domain shift between source and target domains. We conduct the cross-scenario and cross-dataset experiments on the ScanNet and NYU-Depth-v2 datasets to verify the effectiveness of our method and achieve impressive performance.</p>\",\"PeriodicalId\":50937,\"journal\":{\"name\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"volume\":\"36 1\",\"pages\":\"\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2024-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3672397\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3672397","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

在过去几年中，从单目图像预测深度图取得了令人瞩目的成就。然而，大多数深度估算方法都是通过成对图像深度图数据或多视角图像（如立体配对和单目序列）进行训练的，这些方法存在注释成本昂贵和可移植性差的问题。虽然引入了无监督域适应方法来减轻对注释数据的依赖，但很少有研究集中于无监督跨场景室内单目深度估计。在本文中，我们提出在基于对抗的领域适应范例中研究深度估计模型在不同室内场景中的泛化。具体来说，域判别器用于判别源域和目标域的表示，而特征提取器则旨在通过捕捉域不变特征来混淆域判别器。此外，在标注源数据的监督下，我们从潜在表征重建深度图。因此，特征提取器学习到的特征具有域不变和低源风险的优点，而深度估计器可以处理源域和目标域之间的域偏移。我们在 ScanNet 和 NYU-Depth-v2 数据集上进行了跨场景和跨数据集实验，验证了我们方法的有效性，并取得了令人印象深刻的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning Domain Invariant Features for Unsupervised Indoor Depth Estimation Adaptation

Predicting depth maps from monocular images has made an impressive performance in the past years. However, most depth estimation methods are trained with paired image-depth map data or multi-view images (e.g., stereo pair and monocular sequence), which suffer from expensive annotation costs and poor transferability. Although unsupervised domain adaptation methods are introduced to mitigate the reliance on annotated data, rare works focus on the unsupervised cross-scenario indoor monocular depth estimation. In this paper, we propose to study the generalization of depth estimation models across different indoor scenarios in an adversarial-based domain adaptation paradigm. Concretely, a domain discriminator is designed for discriminating the representation from source and target domains, while the feature extractor aims to confuse the domain discriminator by capturing domain-invariant features. Further, we reconstruct depth maps from latent representations with the supervision of labeled source data. As a result, the feature extractor learned features possess the merit of both domain-invariant and low source risk, and the depth estimator can deal with the domain shift between source and target domains. We conduct the cross-scenario and cross-dataset experiments on the ScanNet and NYU-Depth-v2 datasets to verify the effectiveness of our method and achieve impressive performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Multimedia Computing Communications and Applications 工程技术-计算机：理论方法

CiteScore

8.50

自引率

5.90%

发文量

285

审稿时长

7.5 months

期刊介绍： The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome. TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.