Self-Supervised Intensity-Event Stereo Matching

Imaging Sensors and Systems Pub Date : 2022-11-01 DOI:10.48550/arXiv.2211.00509

Jinjin Gu, Jinan Zhou, Ringo S. W. Chu, Yan Chen, Jiawei Zhang, Xuanye Cheng, Song Zhang, Jimmy S. J. Ren

{"title":"Self-Supervised Intensity-Event Stereo Matching","authors":"Jinjin Gu, Jinan Zhou, Ringo S. W. Chu, Yan Chen, Jiawei Zhang, Xuanye Cheng, Song Zhang, Jimmy S. J. Ren","doi":"10.48550/arXiv.2211.00509","DOIUrl":null,"url":null,"abstract":"Event cameras are novel bio-inspired vision sensors that output pixel-level intensity changes in microsecond accuracy with a high dynamic range and low power consumption. Despite these advantages, event cameras cannot be directly applied to computational imaging tasks due to the inability to obtain high-quality intensity and events simultaneously. This paper aims to connect a standalone event camera and a modern intensity camera so that the applications can take advantage of both two sensors. We establish this connection through a multi-modal stereo matching task. We first convert events to a reconstructed image and extend the existing stereo networks to this multi-modality condition. We propose a self-supervised method to train the multi-modal stereo network without using ground truth disparity data. The structure loss calculated on image gradients is used to enable self-supervised learning on such multi-modal data. Exploiting the internal stereo constraint between views with different modalities, we introduce general stereo loss functions, including disparity cross-consistency loss and internal disparity loss, leading to improved performance and robustness compared to existing approaches. The experiments demonstrate the effectiveness of the proposed method, especially the proposed general stereo loss functions, on both synthetic and real datasets. At last, we shed light on employing the aligned events and intensity images in downstream tasks, e.g., video interpolation application.","PeriodicalId":121190,"journal":{"name":"Imaging Sensors and Systems","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Imaging Sensors and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2211.00509","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Event cameras are novel bio-inspired vision sensors that output pixel-level intensity changes in microsecond accuracy with a high dynamic range and low power consumption. Despite these advantages, event cameras cannot be directly applied to computational imaging tasks due to the inability to obtain high-quality intensity and events simultaneously. This paper aims to connect a standalone event camera and a modern intensity camera so that the applications can take advantage of both two sensors. We establish this connection through a multi-modal stereo matching task. We first convert events to a reconstructed image and extend the existing stereo networks to this multi-modality condition. We propose a self-supervised method to train the multi-modal stereo network without using ground truth disparity data. The structure loss calculated on image gradients is used to enable self-supervised learning on such multi-modal data. Exploiting the internal stereo constraint between views with different modalities, we introduce general stereo loss functions, including disparity cross-consistency loss and internal disparity loss, leading to improved performance and robustness compared to existing approaches. The experiments demonstrate the effectiveness of the proposed method, especially the proposed general stereo loss functions, on both synthetic and real datasets. At last, we shed light on employing the aligned events and intensity images in downstream tasks, e.g., video interpolation application.

查看原文本刊更多论文

自监督强度-事件立体匹配

事件相机是一种新型的仿生视觉传感器，具有高动态范围和低功耗，以微秒级精度输出像素级强度变化。尽管有这些优点，事件相机不能直接应用于计算成像任务，因为无法同时获得高质量的强度和事件。本文的目的是连接一个独立的事件相机和一个现代强度相机，使应用程序可以利用这两个传感器。我们通过多模态立体匹配任务建立了这种联系。我们首先将事件转换为重建图像，并将现有的立体网络扩展到这种多模态条件。提出了一种不使用地面真值视差数据的自监督方法来训练多模态立体网络。利用图像梯度计算的结构损失实现多模态数据的自监督学习。利用不同模态视图之间的内部立体约束，我们引入了一般的立体损失函数，包括视差交叉一致性损失和内部视差损失，与现有方法相比，提高了性能和鲁棒性。实验证明了该方法在合成数据集和真实数据集上的有效性，特别是所提出的一般立体损失函数。最后，我们阐明了在下游任务中使用对齐事件和强度图像，例如视频插值应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Imaging Sensors and Systems

自引率

0.00%

发文量