快速可扩展立体匹配的多尺度迭代残差

Proceedings of the 5th ACM Computer Science in Cars Symposium Pub Date : 2021-10-25 DOI:10.1145/3488904.3493376

Kumail Raza, René Schuster, D. Stricker

{"title":"快速可扩展立体匹配的多尺度迭代残差","authors":"Kumail Raza, René Schuster, D. Stricker","doi":"10.1145/3488904.3493376","DOIUrl":null,"url":null,"abstract":"Despite the remarkable progress of deep learning in stereo matching, there exists a gap in accuracy between real-time models and slower state-of-the-art models which are suitable for practical applications. This paper presents an iterative multi-scale coarse-to-fine refinement (iCFR) framework to bridge this gap by allowing it to adopt any stereo matching network to make it fast, more efficient and scalable while keeping comparable accuracy. To reduce the computational cost of matching, we use multi-scale warped features to estimate disparity residuals and push the disparity search range in the cost volume to a minimum limit. Finally, we apply a refinement network to recover the loss of precision which is inherent in multi-scale approaches. We test our iCFR framework by adopting the matching networks from state-of-the art GANet and AANet. The result is 49 × faster inference time compared to GANet-deep and 4 × less memory consumption, with comparable error. Our best performing network, which we call FRSNet is scalable even up to an input resolution of 6K on a GTX 1080Ti, with inference time still below one second and comparable accuracy to AANet+. It out-performs all real-time stereo methods and achieves competitive accuracy on the KITTI benchmark.","PeriodicalId":332312,"journal":{"name":"Proceedings of the 5th ACM Computer Science in Cars Symposium","volume":"98 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Multi-scale Iterative Residuals for Fast and Scalable Stereo Matching\",\"authors\":\"Kumail Raza, René Schuster, D. Stricker\",\"doi\":\"10.1145/3488904.3493376\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Despite the remarkable progress of deep learning in stereo matching, there exists a gap in accuracy between real-time models and slower state-of-the-art models which are suitable for practical applications. This paper presents an iterative multi-scale coarse-to-fine refinement (iCFR) framework to bridge this gap by allowing it to adopt any stereo matching network to make it fast, more efficient and scalable while keeping comparable accuracy. To reduce the computational cost of matching, we use multi-scale warped features to estimate disparity residuals and push the disparity search range in the cost volume to a minimum limit. Finally, we apply a refinement network to recover the loss of precision which is inherent in multi-scale approaches. We test our iCFR framework by adopting the matching networks from state-of-the art GANet and AANet. The result is 49 × faster inference time compared to GANet-deep and 4 × less memory consumption, with comparable error. Our best performing network, which we call FRSNet is scalable even up to an input resolution of 6K on a GTX 1080Ti, with inference time still below one second and comparable accuracy to AANet+. It out-performs all real-time stereo methods and achieves competitive accuracy on the KITTI benchmark.\",\"PeriodicalId\":332312,\"journal\":{\"name\":\"Proceedings of the 5th ACM Computer Science in Cars Symposium\",\"volume\":\"98 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 5th ACM Computer Science in Cars Symposium\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3488904.3493376\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th ACM Computer Science in Cars Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3488904.3493376","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

尽管深度学习在立体匹配方面取得了显著进展，但实时模型与适合实际应用的较慢的最先进模型在精度上存在差距。本文提出了一种迭代的多尺度粗到细细化(iCFR)框架，通过允许它采用任何立体匹配网络，使其快速，更高效和可扩展，同时保持相当的精度，从而弥补了这一差距。为了降低匹配的计算成本，我们使用多尺度扭曲特征来估计视差残差，并将代价体积中的视差搜索范围推至最小。最后，我们应用一个改进网络来恢复多尺度方法固有的精度损失。我们通过采用最先进的GANet和AANet的匹配网络来测试我们的iCFR框架。结果是推理时间比GANet-deep快49倍，内存消耗减少4倍，误差相当。我们表现最好的网络，我们称之为FRSNet，在GTX 1080Ti上甚至可以扩展到6K的输入分辨率，推理时间仍然低于一秒，精度与AANet+相当。它优于所有实时立体方法，并在KITTI基准上达到具有竞争力的精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-scale Iterative Residuals for Fast and Scalable Stereo Matching

Despite the remarkable progress of deep learning in stereo matching, there exists a gap in accuracy between real-time models and slower state-of-the-art models which are suitable for practical applications. This paper presents an iterative multi-scale coarse-to-fine refinement (iCFR) framework to bridge this gap by allowing it to adopt any stereo matching network to make it fast, more efficient and scalable while keeping comparable accuracy. To reduce the computational cost of matching, we use multi-scale warped features to estimate disparity residuals and push the disparity search range in the cost volume to a minimum limit. Finally, we apply a refinement network to recover the loss of precision which is inherent in multi-scale approaches. We test our iCFR framework by adopting the matching networks from state-of-the art GANet and AANet. The result is 49 × faster inference time compared to GANet-deep and 4 × less memory consumption, with comparable error. Our best performing network, which we call FRSNet is scalable even up to an input resolution of 6K on a GTX 1080Ti, with inference time still below one second and comparable accuracy to AANet+. It out-performs all real-time stereo methods and achieves competitive accuracy on the KITTI benchmark.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 5th ACM Computer Science in Cars Symposium

自引率

0.00%

发文量