航空场景下单目深度估计的时间关注

2022 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME) Pub Date : 2022-11-16 DOI:10.1109/ICECCME55909.2022.9988383

Vlad-Cristian Miclea, S. Nedevschi

{"title":"航空场景下单目深度估计的时间关注","authors":"Vlad-Cristian Miclea, S. Nedevschi","doi":"10.1109/ICECCME55909.2022.9988383","DOIUrl":null,"url":null,"abstract":"Monocular depth estimation (MDE) is a key task for a large set of computer vision applications, convolutional neural networks (CNNs) being nowadays employed for this task. The objective of measuring the world from a single image is cumbersome, especially in case of highly complex scenarios where there is a lack in scene structure. State of the art deep learning-based methods cope with this problem by employing very powerful feature extractors, mixed with additional scene priors such as geometrical or semantic information. The usage of such approaches generally leads to high amounts of resources, computations which make the system incapable for real-time processing. In this work we propose a novel method that tries to account for the time constraints while providing accurate depth maps from a monocular system. Thus, instead of providing geometric or semantic priors which need complex additional processing (generally an additional CNN), we aid the depth estimation process with features extracted and preserved from previous frames. To this end, we propose a novel temporal attention sub-network, that properly extracts the aforementioned features and it combines them with the last available depth map. This sub-network is then inserted into a novel CNN architecture, that proves to generate better depth maps. We test the efficiency of our method on aerial images and obtain an improved accuracy while keeping the amount of resources as low as possible.","PeriodicalId":202568,"journal":{"name":"2022 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Temporal Attention for Monocular Depth Estimation in Aerial Scenarios\",\"authors\":\"Vlad-Cristian Miclea, S. Nedevschi\",\"doi\":\"10.1109/ICECCME55909.2022.9988383\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Monocular depth estimation (MDE) is a key task for a large set of computer vision applications, convolutional neural networks (CNNs) being nowadays employed for this task. The objective of measuring the world from a single image is cumbersome, especially in case of highly complex scenarios where there is a lack in scene structure. State of the art deep learning-based methods cope with this problem by employing very powerful feature extractors, mixed with additional scene priors such as geometrical or semantic information. The usage of such approaches generally leads to high amounts of resources, computations which make the system incapable for real-time processing. In this work we propose a novel method that tries to account for the time constraints while providing accurate depth maps from a monocular system. Thus, instead of providing geometric or semantic priors which need complex additional processing (generally an additional CNN), we aid the depth estimation process with features extracted and preserved from previous frames. To this end, we propose a novel temporal attention sub-network, that properly extracts the aforementioned features and it combines them with the last available depth map. This sub-network is then inserted into a novel CNN architecture, that proves to generate better depth maps. We test the efficiency of our method on aerial images and obtain an improved accuracy while keeping the amount of resources as low as possible.\",\"PeriodicalId\":202568,\"journal\":{\"name\":\"2022 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICECCME55909.2022.9988383\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECCME55909.2022.9988383","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

单目深度估计(MDE)是大量计算机视觉应用的关键任务，卷积神经网络(cnn)目前被用于该任务。从单个图像测量世界的目标是繁琐的，特别是在缺乏场景结构的高度复杂的场景下。最先进的基于深度学习的方法通过使用非常强大的特征提取器，混合额外的场景先验(如几何或语义信息)来解决这个问题。使用这种方法通常会导致大量的资源和计算，使系统无法进行实时处理。在这项工作中，我们提出了一种新的方法，试图在提供单目系统精确深度图的同时考虑时间限制。因此，我们没有提供需要复杂的附加处理(通常是额外的CNN)的几何或语义先验，而是使用从以前的帧中提取和保留的特征来帮助深度估计过程。为此，我们提出了一种新的时间注意子网络，该网络适当地提取上述特征并将其与最后可用的深度图相结合。然后将该子网络插入到一个新的CNN架构中，该架构被证明可以生成更好的深度图。我们在航空图像上测试了我们的方法的效率，在保持尽可能低的资源量的同时获得了更高的精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Temporal Attention for Monocular Depth Estimation in Aerial Scenarios

Monocular depth estimation (MDE) is a key task for a large set of computer vision applications, convolutional neural networks (CNNs) being nowadays employed for this task. The objective of measuring the world from a single image is cumbersome, especially in case of highly complex scenarios where there is a lack in scene structure. State of the art deep learning-based methods cope with this problem by employing very powerful feature extractors, mixed with additional scene priors such as geometrical or semantic information. The usage of such approaches generally leads to high amounts of resources, computations which make the system incapable for real-time processing. In this work we propose a novel method that tries to account for the time constraints while providing accurate depth maps from a monocular system. Thus, instead of providing geometric or semantic priors which need complex additional processing (generally an additional CNN), we aid the depth estimation process with features extracted and preserved from previous frames. To this end, we propose a novel temporal attention sub-network, that properly extracts the aforementioned features and it combines them with the last available depth map. This sub-network is then inserted into a novel CNN architecture, that proves to generate better depth maps. We test the efficiency of our method on aerial images and obtain an improved accuracy while keeping the amount of resources as low as possible.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME)

自引率

0.00%

发文量