通过知识升华提高基于自监督视频的人类行为识别

LatinX in AI at Neural Information Processing Systems Conference 2022 Pub Date : 2022-11-28 DOI:10.52591/lxai202211286

Fernando Camarena, M. González-Mendoza, Leonardo Chang, N. Hernández-Gress

{"title":"通过知识升华提高基于自监督视频的人类行为识别","authors":"Fernando Camarena, M. González-Mendoza, Leonardo Chang, N. Hernández-Gress","doi":"10.52591/lxai202211286","DOIUrl":null,"url":null,"abstract":"Deep learning architectures lead the state-of-the-art in several computer vision, natural language processing, and reinforcement learning tasks due to their ability to extract multi-level representations without human engineering. The model’s performance is affected by the amount of labeled data used in training. Hence, novel approaches like self-supervised learning (SSL) extract the supervisory signal using unlabeled data. Although SSL reduces the dependency on human annotations, there are still two main drawbacks. First, high-computational resources are required to train a large-scale model from scratch. Second, knowledge from an SSL model is commonly finetuning to a target model, which forces them to share the same parameters and architecture and make it task-dependent. This paper explores how SSL benefits from knowledge distillation in constructing an efficient and non-task-dependent training framework. The experimental design compared the training process of an SSL algorithm trained from scratch and boosted by knowledge distillation in a teacher-student paradigm using the video-based human action recognition dataset UCF101. Results show that knowledge distillation accelerates the convergence of a network and removes the reliance on model architectures.","PeriodicalId":266286,"journal":{"name":"LatinX in AI at Neural Information Processing Systems Conference 2022","volume":"443 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Boosting Self-supervised Video-based Human Action Recognition Through Knowledge Distillation\",\"authors\":\"Fernando Camarena, M. González-Mendoza, Leonardo Chang, N. Hernández-Gress\",\"doi\":\"10.52591/lxai202211286\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning architectures lead the state-of-the-art in several computer vision, natural language processing, and reinforcement learning tasks due to their ability to extract multi-level representations without human engineering. The model’s performance is affected by the amount of labeled data used in training. Hence, novel approaches like self-supervised learning (SSL) extract the supervisory signal using unlabeled data. Although SSL reduces the dependency on human annotations, there are still two main drawbacks. First, high-computational resources are required to train a large-scale model from scratch. Second, knowledge from an SSL model is commonly finetuning to a target model, which forces them to share the same parameters and architecture and make it task-dependent. This paper explores how SSL benefits from knowledge distillation in constructing an efficient and non-task-dependent training framework. The experimental design compared the training process of an SSL algorithm trained from scratch and boosted by knowledge distillation in a teacher-student paradigm using the video-based human action recognition dataset UCF101. Results show that knowledge distillation accelerates the convergence of a network and removes the reliance on model architectures.\",\"PeriodicalId\":266286,\"journal\":{\"name\":\"LatinX in AI at Neural Information Processing Systems Conference 2022\",\"volume\":\"443 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"LatinX in AI at Neural Information Processing Systems Conference 2022\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.52591/lxai202211286\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"LatinX in AI at Neural Information Processing Systems Conference 2022","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.52591/lxai202211286","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

深度学习架构在一些计算机视觉、自然语言处理和强化学习任务中处于领先地位，因为它们能够在没有人工工程的情况下提取多层次表示。模型的性能受到训练中使用的标记数据量的影响。因此，像自监督学习(SSL)这样的新方法使用未标记的数据提取监督信号。尽管SSL减少了对人工注释的依赖，但仍然存在两个主要缺点。首先，从头开始训练大规模模型需要高计算资源。其次，来自SSL模型的知识通常被调优到目标模型，这迫使它们共享相同的参数和体系结构，并使其与任务相关。本文探讨了SSL如何从知识蒸馏中受益，从而构建一个高效的、非任务依赖的训练框架。实验设计使用基于视频的人类动作识别数据集UCF101，比较了师生范式下从头开始训练和知识蒸馏促进的SSL算法的训练过程。结果表明，知识蒸馏加速了网络的收敛，消除了对模型体系结构的依赖。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Boosting Self-supervised Video-based Human Action Recognition Through Knowledge Distillation

Deep learning architectures lead the state-of-the-art in several computer vision, natural language processing, and reinforcement learning tasks due to their ability to extract multi-level representations without human engineering. The model’s performance is affected by the amount of labeled data used in training. Hence, novel approaches like self-supervised learning (SSL) extract the supervisory signal using unlabeled data. Although SSL reduces the dependency on human annotations, there are still two main drawbacks. First, high-computational resources are required to train a large-scale model from scratch. Second, knowledge from an SSL model is commonly finetuning to a target model, which forces them to share the same parameters and architecture and make it task-dependent. This paper explores how SSL benefits from knowledge distillation in constructing an efficient and non-task-dependent training framework. The experimental design compared the training process of an SSL algorithm trained from scratch and boosted by knowledge distillation in a teacher-student paradigm using the video-based human action recognition dataset UCF101. Results show that knowledge distillation accelerates the convergence of a network and removes the reliance on model architectures.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

LatinX in AI at Neural Information Processing Systems Conference 2022

自引率

0.00%

发文量