Investigating Synthetic-to-Real Transfer Robustness for Stereo Matching and Optical Flow Estimation

IF 18.6

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-07-01 DOI:10.1109/TPAMI.2025.3584847

Jiawei Zhang;Jiahe Li;Lei Huang;Haonan Luo;Xiaohan Yu;Lin Gu;Jin Zheng;Xiao Bai

{"title":"Investigating Synthetic-to-Real Transfer Robustness for Stereo Matching and Optical Flow Estimation","authors":"Jiawei Zhang;Jiahe Li;Lei Huang;Haonan Luo;Xiaohan Yu;Lin Gu;Jin Zheng;Xiao Bai","doi":"10.1109/TPAMI.2025.3584847","DOIUrl":null,"url":null,"abstract":"With advancements in robust stereo matching and optical flow estimation networks, models pre-trained on synthetic data demonstrate strong robustness to unseen domains. However, their robustness can be seriously degraded when fine-tuning them in real-world scenarios. This paper investigates fine-tuning stereo matching and optical flow estimation networks without compromising their robustness to unseen domains. Specifically, we divide the pixels into consistent and inconsistent regions by comparing Ground Truth (GT) with Pseudo Label (PL) and demonstrate that the imbalance learning of consistent and inconsistent regions in GT causes robustness degradation. Based on our analysis, we propose the DKT framework, which utilizes PL to balance the learning of different regions in GT. The core idea is to utilize an exponential moving average (EMA) teacher to measure what the student network has learned and dynamically adjust the learning regions. We further propose the DKT++ framework, which improves target-domain performances and network robustness by applying slow-fast update teachers to generate more accurate PL, introducing the unlabeled data and synthetic data. We integrate our frameworks with state-of-the-art networks and evaluate their effectiveness on several real-world datasets. Extensive experiments show that our method effectively preserves the robustness of stereo matching and optical flow networks during fine-tuning.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 10","pages":"9113-9129"},"PeriodicalIF":18.6000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11060851/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

With advancements in robust stereo matching and optical flow estimation networks, models pre-trained on synthetic data demonstrate strong robustness to unseen domains. However, their robustness can be seriously degraded when fine-tuning them in real-world scenarios. This paper investigates fine-tuning stereo matching and optical flow estimation networks without compromising their robustness to unseen domains. Specifically, we divide the pixels into consistent and inconsistent regions by comparing Ground Truth (GT) with Pseudo Label (PL) and demonstrate that the imbalance learning of consistent and inconsistent regions in GT causes robustness degradation. Based on our analysis, we propose the DKT framework, which utilizes PL to balance the learning of different regions in GT. The core idea is to utilize an exponential moving average (EMA) teacher to measure what the student network has learned and dynamically adjust the learning regions. We further propose the DKT++ framework, which improves target-domain performances and network robustness by applying slow-fast update teachers to generate more accurate PL, introducing the unlabeled data and synthetic data. We integrate our frameworks with state-of-the-art networks and evaluate their effectiveness on several real-world datasets. Extensive experiments show that our method effectively preserves the robustness of stereo matching and optical flow networks during fine-tuning.

查看原文本刊更多论文

立体匹配和光流估计的合成到真实传输鲁棒性研究

随着鲁棒立体匹配和光流估计网络的进步，在合成数据上预训练的模型对未知域具有很强的鲁棒性。然而，在实际场景中对它们进行微调时，它们的健壮性可能会严重降低。本文研究了不影响其对未知域鲁棒性的微调立体匹配和光流估计网络。具体来说，我们通过比较Ground Truth （GT）和Pseudo Label （PL）将像素划分为一致和不一致的区域，并证明了GT中一致和不一致区域的不平衡学习导致鲁棒性退化。基于我们的分析，我们提出了DKT框架，该框架利用PL来平衡GT中不同区域的学习。其核心思想是利用指数移动平均（EMA）教师来衡量学生网络学习的内容并动态调整学习区域。我们进一步提出了DKT++框架，该框架通过引入未标记数据和合成数据，应用慢速更新教师来生成更准确的PL，提高了目标域性能和网络鲁棒性。我们将我们的框架与最先进的网络相结合，并在几个真实世界的数据集上评估它们的有效性。大量实验表明，该方法在微调过程中有效地保持了立体匹配和光流网络的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量