Exploring Joint Embedding Architectures and Data Augmentations for Self-Supervised Representation Learning in Event-Based Vision

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI:10.1109/CVPRW59228.2023.00405

Sami Barchid, José Mennesson, C. Djeraba

{"title":"Exploring Joint Embedding Architectures and Data Augmentations for Self-Supervised Representation Learning in Event-Based Vision","authors":"Sami Barchid, José Mennesson, C. Djeraba","doi":"10.1109/CVPRW59228.2023.00405","DOIUrl":null,"url":null,"abstract":"This paper proposes a self-supervised representation learning (SSRL) framework for event-based vision, which leverages various lightweight convolutional neural networks (CNNs) including 2D-, 3D-, and Spiking CNNs. The method uses a joint embedding architecture to maximize the agreement between features extracted from different views of the same event sequence. Popular event data augmentation techniques are employed to design an efficient augmentation policy for event-based SSRL, and we provide novel data augmentation methods to enhance the pretraining pipeline. Given the novelty of SSRL for event-based vision, we elaborate standard evaluation protocols and use them to evaluate our approach. Our study demonstrates that pretrained CNNs acquire effective and transferable features, enabling them to achieve competitive performance in object or action recognition across various commonly used event-based datasets, even in a low-data regime. This paper also conducts an experimental analysis of the extracted features regarding the Uniformity-Tolerance tradeoff to assess their quality, and measure the similarity of representations using linear Center Kernel Alignement. These quantitative measurements reinforce our observations from the performance benchmarks and show substantial differences between the learned representations of all types of CNNs despite being optimized with the same approach.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPRW59228.2023.00405","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

This paper proposes a self-supervised representation learning (SSRL) framework for event-based vision, which leverages various lightweight convolutional neural networks (CNNs) including 2D-, 3D-, and Spiking CNNs. The method uses a joint embedding architecture to maximize the agreement between features extracted from different views of the same event sequence. Popular event data augmentation techniques are employed to design an efficient augmentation policy for event-based SSRL, and we provide novel data augmentation methods to enhance the pretraining pipeline. Given the novelty of SSRL for event-based vision, we elaborate standard evaluation protocols and use them to evaluate our approach. Our study demonstrates that pretrained CNNs acquire effective and transferable features, enabling them to achieve competitive performance in object or action recognition across various commonly used event-based datasets, even in a low-data regime. This paper also conducts an experimental analysis of the extracted features regarding the Uniformity-Tolerance tradeoff to assess their quality, and measure the similarity of representations using linear Center Kernel Alignement. These quantitative measurements reinforce our observations from the performance benchmarks and show substantial differences between the learned representations of all types of CNNs despite being optimized with the same approach.

查看原文本刊更多论文

探索基于事件视觉的自监督表示学习的联合嵌入架构和数据增强

本文提出了一种基于事件视觉的自监督表示学习(SSRL)框架，该框架利用了各种轻量级卷积神经网络(cnn)，包括2D-， 3D-和spike cnn。该方法采用联合嵌入架构，最大限度地提高从同一事件序列的不同视图中提取的特征之间的一致性。利用流行的事件数据增强技术设计了一种高效的基于事件的SSRL增强策略，并提供了新的数据增强方法来增强预训练管道。考虑到SSRL对于基于事件的视觉的新颖性，我们制定了标准评估协议并使用它们来评估我们的方法。我们的研究表明，预训练的cnn获得了有效和可转移的特征，使它们能够在各种常用的基于事件的数据集上实现目标或动作识别的竞争性能，即使在低数据状态下也是如此。本文还对提取的特征进行了实验分析，考虑一致性-容错性权衡，以评估其质量，并使用线性中心核对齐测量表征的相似性。这些定量测量强化了我们从性能基准中观察到的结果，并显示了所有类型cnn的学习表征之间的实质性差异，尽管使用相同的方法进行了优化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

自引率

0.00%

发文量