流-任何：从大规模单视图图像学习真实世界的光流估计

IF 18.6

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-06-16 DOI:10.1109/TPAMI.2025.3576851

Yingping Liang;Ying Fu;Yutao Hu;Wenqi Shao;Jiaming Liu;Debing Zhang

{"title":"流-任何：从大规模单视图图像学习真实世界的光流估计","authors":"Yingping Liang;Ying Fu;Yutao Hu;Wenqi Shao;Jiaming Liu;Debing Zhang","doi":"10.1109/TPAMI.2025.3576851","DOIUrl":null,"url":null,"abstract":"Optical flow estimation is a crucial subfield of computer vision, serving as a foundation for video tasks. However, the real-world robustness is limited by animated synthetic datasets for training. This introduces domain gaps when applied to real-world applications and limits the benefits of scaling up datasets. To address these challenges, we propose <bold>Flow-Anything</b>, a large-scale data generation framework designed to learn optical flow estimation from any single-view images in the real world. We employ two effective steps to make data scaling-up promising. First, we convert a single-view image into a 3D representation using advanced monocular depth estimation networks. This allows us to render optical flow and novel view images under a virtual camera. Second, we develop an Object-Independent Volume Rendering module and a Depth-Aware Inpainting module to model the dynamic objects in the 3D representation. These two steps allow us to generate realistic datasets for training from large-scale single-view images, namely <bold>FA-Flow Dataset</b>. For the first time, we demonstrate the benefits of generating optical flow training data from large-scale real-world images, outperforming the most advanced unsupervised methods and supervised methods on synthetic datasets. Moreover, our models serve as a foundation model and enhance the performance of various downstream video tasks.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 10","pages":"8435-8452"},"PeriodicalIF":18.6000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Flow-Anything: Learning Real-World Optical Flow Estimation From Large-Scale Single-View Images\",\"authors\":\"Yingping Liang;Ying Fu;Yutao Hu;Wenqi Shao;Jiaming Liu;Debing Zhang\",\"doi\":\"10.1109/TPAMI.2025.3576851\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Optical flow estimation is a crucial subfield of computer vision, serving as a foundation for video tasks. However, the real-world robustness is limited by animated synthetic datasets for training. This introduces domain gaps when applied to real-world applications and limits the benefits of scaling up datasets. To address these challenges, we propose <bold>Flow-Anything</b>, a large-scale data generation framework designed to learn optical flow estimation from any single-view images in the real world. We employ two effective steps to make data scaling-up promising. First, we convert a single-view image into a 3D representation using advanced monocular depth estimation networks. This allows us to render optical flow and novel view images under a virtual camera. Second, we develop an Object-Independent Volume Rendering module and a Depth-Aware Inpainting module to model the dynamic objects in the 3D representation. These two steps allow us to generate realistic datasets for training from large-scale single-view images, namely <bold>FA-Flow Dataset</b>. For the first time, we demonstrate the benefits of generating optical flow training data from large-scale real-world images, outperforming the most advanced unsupervised methods and supervised methods on synthetic datasets. Moreover, our models serve as a foundation model and enhance the performance of various downstream video tasks.\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":\"47 10\",\"pages\":\"8435-8452\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2025-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11037400/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11037400/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

光流估计是计算机视觉的一个重要分支，是视频任务的基础。然而，现实世界的鲁棒性受到动画合成数据集训练的限制。这在应用于实际应用程序时引入了领域差距，并限制了扩展数据集的好处。为了应对这些挑战，我们提出了flow - anything，这是一个大规模数据生成框架，旨在从现实世界中的任何单视图图像中学习光流估计。我们采用了两个有效的步骤使数据规模扩大有希望。首先，我们使用先进的单目深度估计网络将单视图图像转换为3D表示。这使我们能够在虚拟相机下渲染光流和新颖的视图图像。其次，我们开发了一个独立于对象的体绘制模块和一个深度感知的绘制模块来建模三维表示中的动态对象。这两个步骤使我们能够从大规模单视图图像中生成用于训练的真实数据集，即FA-Flow Dataset。我们首次展示了从大规模真实世界图像中生成光流训练数据的好处，在合成数据集上优于最先进的无监督方法和有监督方法。此外，我们的模型作为基础模型，提高了各种下游视频任务的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Flow-Anything: Learning Real-World Optical Flow Estimation From Large-Scale Single-View Images

Optical flow estimation is a crucial subfield of computer vision, serving as a foundation for video tasks. However, the real-world robustness is limited by animated synthetic datasets for training. This introduces domain gaps when applied to real-world applications and limits the benefits of scaling up datasets. To address these challenges, we propose Flow-Anything, a large-scale data generation framework designed to learn optical flow estimation from any single-view images in the real world. We employ two effective steps to make data scaling-up promising. First, we convert a single-view image into a 3D representation using advanced monocular depth estimation networks. This allows us to render optical flow and novel view images under a virtual camera. Second, we develop an Object-Independent Volume Rendering module and a Depth-Aware Inpainting module to model the dynamic objects in the 3D representation. These two steps allow us to generate realistic datasets for training from large-scale single-view images, namely FA-Flow Dataset. For the first time, we demonstrate the benefits of generating optical flow training data from large-scale real-world images, outperforming the most advanced unsupervised methods and supervised methods on synthetic datasets. Moreover, our models serve as a foundation model and enhance the performance of various downstream video tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量