The Impact of Action in Visual Representation Learning

2022 IEEE International Conference on Development and Learning (ICDL) Pub Date : 2022-09-12 DOI:10.1109/ICDL53763.2022.9962210

Alexandre Devillers, Valentin Chaffraix, Frederic Armetta, S. Duffner, Mathieu Lefort

{"title":"The Impact of Action in Visual Representation Learning","authors":"Alexandre Devillers, Valentin Chaffraix, Frederic Armetta, S. Duffner, Mathieu Lefort","doi":"10.1109/ICDL53763.2022.9962210","DOIUrl":null,"url":null,"abstract":"Sensori-motor theories, inspired by work in neuroscience, psychology and cognitive science, claim that actions, through learning and mastering of a predictive model, are a key element in the perception of the environment. On the computational side, in the domains of representation learning and reinforcement learning, models are increasingly using self-supervised pretext tasks, such as predictive or contrastive ones, in order to increase the performance on their main task. These pretext tasks are action-related even if the action itself is usually not used in the model. In this paper, we propose to study the influence of considering action in the learning of visual representations in deep neural network models, an aspect which is often underestimated w.r.t. sensori-motor theories. More precisely, we quantity two independent factors: 1-whether or not to use the action during the learning of visual characteristics, and 2-whether or not to integrate the action in the representations of the current images. Other aspects will be kept as simple and comparable as possible, that is why we will not consider any specific action policies and combine simple architectures (VAE and LSTM), while using datasets derived from MNIST. In this context, our results show that explicitly including action in the learning process and in the representations improves the performance of the model, which opens interesting perspectives to improve state-of-the-art models of representation learning.","PeriodicalId":274171,"journal":{"name":"2022 IEEE International Conference on Development and Learning (ICDL)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Development and Learning (ICDL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDL53763.2022.9962210","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Sensori-motor theories, inspired by work in neuroscience, psychology and cognitive science, claim that actions, through learning and mastering of a predictive model, are a key element in the perception of the environment. On the computational side, in the domains of representation learning and reinforcement learning, models are increasingly using self-supervised pretext tasks, such as predictive or contrastive ones, in order to increase the performance on their main task. These pretext tasks are action-related even if the action itself is usually not used in the model. In this paper, we propose to study the influence of considering action in the learning of visual representations in deep neural network models, an aspect which is often underestimated w.r.t. sensori-motor theories. More precisely, we quantity two independent factors: 1-whether or not to use the action during the learning of visual characteristics, and 2-whether or not to integrate the action in the representations of the current images. Other aspects will be kept as simple and comparable as possible, that is why we will not consider any specific action policies and combine simple architectures (VAE and LSTM), while using datasets derived from MNIST. In this context, our results show that explicitly including action in the learning process and in the representations improves the performance of the model, which opens interesting perspectives to improve state-of-the-art models of representation learning.

查看原文本刊更多论文

动作对视觉表征学习的影响

受神经科学、心理学和认知科学启发的感觉运动理论声称，通过学习和掌握预测模型，行动是感知环境的关键因素。在计算方面，在表示学习和强化学习领域，模型越来越多地使用自我监督的借口任务，如预测或对比任务，以提高其主要任务的性能。这些借口任务是与动作相关的，即使动作本身通常不在模型中使用。在本文中，我们建议研究考虑动作在深度神经网络模型视觉表征学习中的影响，这是一个在感觉运动理论中经常被低估的方面。更准确地说，我们量化了两个独立的因素:1-是否在学习视觉特征时使用动作，2-是否将动作整合到当前图像的表示中。其他方面将尽可能保持简单和可比性，这就是为什么我们将不考虑任何特定的行动策略并结合简单的架构(VAE和LSTM)，同时使用来自MNIST的数据集。在这种情况下，我们的研究结果表明，在学习过程和表征中明确地包含动作可以提高模型的性能，这为改进最先进的表征学习模型开辟了有趣的视角。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE International Conference on Development and Learning (ICDL)

自引率

0.00%

发文量