360SFUDA++: Towards Source-Free UDA for Panoramic Segmentation by Learning Reliable Category Prototypes

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-11-04 DOI:10.1109/TPAMI.2024.3490619

Xu Zheng;Peng Yuan Zhou;Athanasios V. Vasilakos;Lin Wang

{"title":"360SFUDA++: Towards Source-Free UDA for Panoramic Segmentation by Learning Reliable Category Prototypes","authors":"Xu Zheng;Peng Yuan Zhou;Athanasios V. Vasilakos;Lin Wang","doi":"10.1109/TPAMI.2024.3490619","DOIUrl":null,"url":null,"abstract":"In this paper, we address the challenging source-free unsupervised domain adaptation (SFUDA) for pinhole-to-panoramic semantic segmentation, given only a pinhole image pre-trained model (i.e., source) and unlabeled panoramic images (i.e., target). Tackling this problem is non-trivial due to three critical challenges: 1) semantic mismatches from the distinct Field-of-View (FoV) between domains, 2) style discrepancies inherent in the UDA problem, and 3) inevitable distortion of the panoramic images. To tackle these problems, we propose \n<b>360SFUDA++</b>\n that effectively extracts knowledge from the source pinhole model with only unlabeled panoramic images and transfers the reliable knowledge to the target panoramic domain. Specifically, we first utilize Tangent Projection (TP) as it has less distortion and meanwhile slits the equirectangular projection (ERP) to patches with fixed FoV projection (FFP) to mimic the pinhole images. Both projections are shown effective in extracting knowledge from the source model. However, as the distinct projections make it less possible to directly transfer knowledge between domains, we then propose Reliable Panoramic Prototype Adaptation Module (RP\n<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>\nAM) to transfer knowledge at both prediction and prototype levels. RP\n<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>\nAM selects the confident knowledge and integrates panoramic prototypes for reliable knowledge adaptation. Moreover, we introduce Cross-projection Dual Attention Module (CDAM), which better aligns the spatial and channel characteristics across projections at the feature level between domains. Both knowledge extraction and transfer processes are synchronously updated to reach the best performance. Extensive experiments on the synthetic and real-world benchmarks, including outdoor and indoor scenarios, demonstrate that our 360SFUDA++ achieves significantly better performance than prior SFUDA methods.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 2","pages":"1190-1204"},"PeriodicalIF":0.0000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10741594/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In this paper, we address the challenging source-free unsupervised domain adaptation (SFUDA) for pinhole-to-panoramic semantic segmentation, given only a pinhole image pre-trained model (i.e., source) and unlabeled panoramic images (i.e., target). Tackling this problem is non-trivial due to three critical challenges: 1) semantic mismatches from the distinct Field-of-View (FoV) between domains, 2) style discrepancies inherent in the UDA problem, and 3) inevitable distortion of the panoramic images. To tackle these problems, we propose 360SFUDA++ that effectively extracts knowledge from the source pinhole model with only unlabeled panoramic images and transfers the reliable knowledge to the target panoramic domain. Specifically, we first utilize Tangent Projection (TP) as it has less distortion and meanwhile slits the equirectangular projection (ERP) to patches with fixed FoV projection (FFP) to mimic the pinhole images. Both projections are shown effective in extracting knowledge from the source model. However, as the distinct projections make it less possible to directly transfer knowledge between domains, we then propose Reliable Panoramic Prototype Adaptation Module (RP

$^{2}$

AM) to transfer knowledge at both prediction and prototype levels. RP

$^{2}$

AM selects the confident knowledge and integrates panoramic prototypes for reliable knowledge adaptation. Moreover, we introduce Cross-projection Dual Attention Module (CDAM), which better aligns the spatial and channel characteristics across projections at the feature level between domains. Both knowledge extraction and transfer processes are synchronously updated to reach the best performance. Extensive experiments on the synthetic and real-world benchmarks, including outdoor and indoor scenarios, demonstrate that our 360SFUDA++ achieves significantly better performance than prior SFUDA methods.

查看原文本刊更多论文

360SFUDA++：通过学习可靠的类别原型，为全景分割实现无源 UDA。

在本文中，我们将解决针孔到全景语义分割的无源无监督域自适应（SFUDA）难题，只给定一个针孔图像预训练模型（即源）和未标记的全景图像（即目标）。由于存在三个关键挑战，解决这一问题并非易事：1) 域间不同视场（FoV）造成的语义不匹配；2) UDA 问题固有的风格差异；3) 全景图像不可避免的失真。为了解决这些问题，我们提出了 360SFUDA++，它能仅通过未标记的全景图像从源针孔模型中有效提取知识，并将可靠的知识传输到目标全景域。具体来说，我们首先利用切线投影（TP），因为它的失真较少，同时将等角投影（ERP）与固定视场投影（FFP）缝合到补丁上，以模拟针孔图像。两种投影都能有效地从源模型中提取知识。然而，由于投影方式不同，在不同领域之间直接转移知识的可能性较小，因此我们提出了可靠的全景原型适配模块（RP 2 AM），在预测和原型两个层面转移知识。RP 2 AM 可选择有把握的知识并整合全景原型，从而实现可靠的知识适配。此外，我们还引入了跨投影双注意模块（CDAM），它能更好地调整域间特征级的跨投影空间和通道特征。知识提取和传输过程同步更新，以达到最佳性能。在包括室外和室内场景在内的合成基准和真实世界基准上进行的大量实验表明，我们的 360SFUDA++ 比以前的 SFUDA 方法取得了明显更好的性能。项目页面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量