通过知识升华提高基于自监督视频的人类行为识别

Fernando Camarena, M. González-Mendoza, Leonardo Chang, N. Hernández-Gress
{"title":"通过知识升华提高基于自监督视频的人类行为识别","authors":"Fernando Camarena, M. González-Mendoza, Leonardo Chang, N. Hernández-Gress","doi":"10.52591/lxai202211286","DOIUrl":null,"url":null,"abstract":"Deep learning architectures lead the state-of-the-art in several computer vision, natural language processing, and reinforcement learning tasks due to their ability to extract multi-level representations without human engineering. The model’s performance is affected by the amount of labeled data used in training. Hence, novel approaches like self-supervised learning (SSL) extract the supervisory signal using unlabeled data. Although SSL reduces the dependency on human annotations, there are still two main drawbacks. First, high-computational resources are required to train a large-scale model from scratch. Second, knowledge from an SSL model is commonly finetuning to a target model, which forces them to share the same parameters and architecture and make it task-dependent. This paper explores how SSL benefits from knowledge distillation in constructing an efficient and non-task-dependent training framework. The experimental design compared the training process of an SSL algorithm trained from scratch and boosted by knowledge distillation in a teacher-student paradigm using the video-based human action recognition dataset UCF101. Results show that knowledge distillation accelerates the convergence of a network and removes the reliance on model architectures.","PeriodicalId":266286,"journal":{"name":"LatinX in AI at Neural Information Processing Systems Conference 2022","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Boosting Self-supervised Video-based Human Action Recognition Through Knowledge Distillation\",\"authors\":\"Fernando Camarena, M. González-Mendoza, Leonardo Chang, N. Hernández-Gress\",\"doi\":\"10.52591/lxai202211286\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning architectures lead the state-of-the-art in several computer vision, natural language processing, and reinforcement learning tasks due to their ability to extract multi-level representations without human engineering. The model’s performance is affected by the amount of labeled data used in training. Hence, novel approaches like self-supervised learning (SSL) extract the supervisory signal using unlabeled data. Although SSL reduces the dependency on human annotations, there are still two main drawbacks. First, high-computational resources are required to train a large-scale model from scratch. Second, knowledge from an SSL model is commonly finetuning to a target model, which forces them to share the same parameters and architecture and make it task-dependent. This paper explores how SSL benefits from knowledge distillation in constructing an efficient and non-task-dependent training framework. The experimental design compared the training process of an SSL algorithm trained from scratch and boosted by knowledge distillation in a teacher-student paradigm using the video-based human action recognition dataset UCF101. Results show that knowledge distillation accelerates the convergence of a network and removes the reliance on model architectures.\",\"PeriodicalId\":266286,\"journal\":{\"name\":\"LatinX in AI at Neural Information Processing Systems Conference 2022\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"LatinX in AI at Neural Information Processing Systems Conference 2022\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.52591/lxai202211286\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"LatinX in AI at Neural Information Processing Systems Conference 2022","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.52591/lxai202211286","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

深度学习架构在一些计算机视觉、自然语言处理和强化学习任务中处于领先地位,因为它们能够在没有人工工程的情况下提取多层次表示。模型的性能受到训练中使用的标记数据量的影响。因此,像自监督学习(SSL)这样的新方法使用未标记的数据提取监督信号。尽管SSL减少了对人工注释的依赖,但仍然存在两个主要缺点。首先,从头开始训练大规模模型需要高计算资源。其次,来自SSL模型的知识通常被调优到目标模型,这迫使它们共享相同的参数和体系结构,并使其与任务相关。本文探讨了SSL如何从知识蒸馏中受益,从而构建一个高效的、非任务依赖的训练框架。实验设计使用基于视频的人类动作识别数据集UCF101,比较了师生范式下从头开始训练和知识蒸馏促进的SSL算法的训练过程。结果表明,知识蒸馏加速了网络的收敛,消除了对模型体系结构的依赖。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Boosting Self-supervised Video-based Human Action Recognition Through Knowledge Distillation
Deep learning architectures lead the state-of-the-art in several computer vision, natural language processing, and reinforcement learning tasks due to their ability to extract multi-level representations without human engineering. The model’s performance is affected by the amount of labeled data used in training. Hence, novel approaches like self-supervised learning (SSL) extract the supervisory signal using unlabeled data. Although SSL reduces the dependency on human annotations, there are still two main drawbacks. First, high-computational resources are required to train a large-scale model from scratch. Second, knowledge from an SSL model is commonly finetuning to a target model, which forces them to share the same parameters and architecture and make it task-dependent. This paper explores how SSL benefits from knowledge distillation in constructing an efficient and non-task-dependent training framework. The experimental design compared the training process of an SSL algorithm trained from scratch and boosted by knowledge distillation in a teacher-student paradigm using the video-based human action recognition dataset UCF101. Results show that knowledge distillation accelerates the convergence of a network and removes the reliance on model architectures.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信