用文本提示识别基于骨骼的人类动作

2022 8th International Conference on Systems and Informatics (ICSAI) Pub Date : 2022-12-10 DOI:10.1109/ICSAI57119.2022.10005459

Lin Yuan, Zhen He, Qianqian Wang, Leiyang Xu, Xiang Ma

{"title":"用文本提示识别基于骨骼的人类动作","authors":"Lin Yuan, Zhen He, Qianqian Wang, Leiyang Xu, Xiang Ma","doi":"10.1109/ICSAI57119.2022.10005459","DOIUrl":null,"url":null,"abstract":"Human action recognition has been a hot research for decades, and mainstream supervised frameworks include a feature extraction backbone and a softmax classifier to predict daily human actions. When the number of classes applied to the dataset changes, we must retrain the classifier on the well-trained backbone. This pipeline restricts the generalization and transfer ability of the model due to an extra training period. Moreover, replacing action labels with simple number labels discards useful semantic information and can only receive a meaningless classifier at last. In this work, we present a model SkeletonCLIP for skeleton-based human action recognition. We add an alternative text encoder to extract semantic information from labels while keeping the original sequence encoder. We use dot production to measure the similarities of sequence-text pairs in place of traditional classifier head and cross-entropy loss. Experiments from three human action datasets show that our framework can reach a higher recognition accuracy with the help of semantic information when training the network from scratch. The code has been shown at eunseo-v/SkeletonCLIP.","PeriodicalId":339547,"journal":{"name":"2022 8th International Conference on Systems and Informatics (ICSAI)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SkeletonCLIP: Recognizing Skeleton-based Human Actions with Text Prompts\",\"authors\":\"Lin Yuan, Zhen He, Qianqian Wang, Leiyang Xu, Xiang Ma\",\"doi\":\"10.1109/ICSAI57119.2022.10005459\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human action recognition has been a hot research for decades, and mainstream supervised frameworks include a feature extraction backbone and a softmax classifier to predict daily human actions. When the number of classes applied to the dataset changes, we must retrain the classifier on the well-trained backbone. This pipeline restricts the generalization and transfer ability of the model due to an extra training period. Moreover, replacing action labels with simple number labels discards useful semantic information and can only receive a meaningless classifier at last. In this work, we present a model SkeletonCLIP for skeleton-based human action recognition. We add an alternative text encoder to extract semantic information from labels while keeping the original sequence encoder. We use dot production to measure the similarities of sequence-text pairs in place of traditional classifier head and cross-entropy loss. Experiments from three human action datasets show that our framework can reach a higher recognition accuracy with the help of semantic information when training the network from scratch. The code has been shown at eunseo-v/SkeletonCLIP.\",\"PeriodicalId\":339547,\"journal\":{\"name\":\"2022 8th International Conference on Systems and Informatics (ICSAI)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 8th International Conference on Systems and Informatics (ICSAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSAI57119.2022.10005459\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 8th International Conference on Systems and Informatics (ICSAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSAI57119.2022.10005459","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

人类行为识别是几十年来的研究热点，主流的监督框架包括特征提取骨干和softmax分类器来预测人类的日常行为。当应用于数据集的类数量发生变化时，我们必须在训练良好的主干上重新训练分类器。由于额外的训练周期，这种管道限制了模型的泛化和迁移能力。而且，用简单的数字标签代替动作标签，丢弃了有用的语义信息，最后只能得到一个无意义的分类器。在这项工作中，我们提出了一个基于骨骼的人体动作识别模型骷髅clip。在保留原始序列编码器的同时，我们增加了一个替代文本编码器来从标签中提取语义信息。我们使用点产生来测量序列文本对的相似性，取代传统的分类器头和交叉熵损失。三个人体动作数据集的实验表明，我们的框架在从头开始训练网络时，借助语义信息可以达到更高的识别精度。代码已显示在eunseo-v/SkeletonCLIP。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SkeletonCLIP: Recognizing Skeleton-based Human Actions with Text Prompts

Human action recognition has been a hot research for decades, and mainstream supervised frameworks include a feature extraction backbone and a softmax classifier to predict daily human actions. When the number of classes applied to the dataset changes, we must retrain the classifier on the well-trained backbone. This pipeline restricts the generalization and transfer ability of the model due to an extra training period. Moreover, replacing action labels with simple number labels discards useful semantic information and can only receive a meaningless classifier at last. In this work, we present a model SkeletonCLIP for skeleton-based human action recognition. We add an alternative text encoder to extract semantic information from labels while keeping the original sequence encoder. We use dot production to measure the similarities of sequence-text pairs in place of traditional classifier head and cross-entropy loss. Experiments from three human action datasets show that our framework can reach a higher recognition accuracy with the help of semantic information when training the network from scratch. The code has been shown at eunseo-v/SkeletonCLIP.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 8th International Conference on Systems and Informatics (ICSAI)

自引率

0.00%

发文量