Javier Hidalgo-Carri'o, Daniel Gehrig, D. Scaramuzza
{"title":"Learning Monocular Dense Depth from Events","authors":"Javier Hidalgo-Carri'o, Daniel Gehrig, D. Scaramuzza","doi":"10.1109/3DV50981.2020.00063","DOIUrl":null,"url":null,"abstract":"Event cameras are novel sensors that output brightness changes in the form of a stream of asynchronous ”events” instead of intensity frames. Compared to conventional image sensors, they offer significant advantages: high temporal resolution, high dynamic range, no motion blur, and much lower bandwidth. Recently, learning-based approaches have been applied to event-based data, thus unlocking their potential and making significant progress in a variety of tasks, such as monocular depth prediction. Most existing approaches use standard feed-forward architectures to generate network predictions, which do not leverage the temporal consistency presents in the event stream. We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods. In particular, our method generates dense depth predictions using a monocular setup, which has not been shown previously. We pretrain our model using a new dataset containing events and depth maps recorded in the CARLA simulator. We test our method on the Multi Vehicle Stereo Event Camera Dataset (MVSEC). Quantitative experiments show up to 50% improvement in average depth error with respect to previous event-based methods. Code and dataset are available at: http://rpg.ifi.uzh.ch/e2depth","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"66","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on 3D Vision (3DV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/3DV50981.2020.00063","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 66
Abstract
Event cameras are novel sensors that output brightness changes in the form of a stream of asynchronous ”events” instead of intensity frames. Compared to conventional image sensors, they offer significant advantages: high temporal resolution, high dynamic range, no motion blur, and much lower bandwidth. Recently, learning-based approaches have been applied to event-based data, thus unlocking their potential and making significant progress in a variety of tasks, such as monocular depth prediction. Most existing approaches use standard feed-forward architectures to generate network predictions, which do not leverage the temporal consistency presents in the event stream. We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods. In particular, our method generates dense depth predictions using a monocular setup, which has not been shown previously. We pretrain our model using a new dataset containing events and depth maps recorded in the CARLA simulator. We test our method on the Multi Vehicle Stereo Event Camera Dataset (MVSEC). Quantitative experiments show up to 50% improvement in average depth error with respect to previous event-based methods. Code and dataset are available at: http://rpg.ifi.uzh.ch/e2depth