基于条件随机场和随机纠错上下文无关语法的两层分类器的舞蹈动作识别和评分方法

2014 IEEE 3rd Global Conference on Consumer Electronics (GCCE) Pub Date : 2014-10-01 DOI:10.1109/GCCE.2014.7031294

Y. Heryadi, M. I. Fanany, A. M. Arymurthy

{"title":"基于条件随机场和随机纠错上下文无关语法的两层分类器的舞蹈动作识别和评分方法","authors":"Y. Heryadi, M. I. Fanany, A. M. Arymurthy","doi":"10.1109/GCCE.2014.7031294","DOIUrl":null,"url":null,"abstract":"This paper presents a unified framework for recognizing and scoring dance motion using 2-layer classifier so that computation complexity is distributed into two layers. This research examines the performance of sliding window, hidden Markov Model (HMM) and conditional random field (CRF) as the first layer classifier to segment the input video into a sequence of motion primitive label. The second layer classifier is stochastic error-correcting context-free grammar, built based on dance master knowledge, to parse the sequence of labels, builds a parse tree, and computes the accumulated dance score. The dataset for this research is captured using one Kinect camera. The training dataset is: 212 samples of 12 motion primitive samples and seven videos of Pendet dance performance. From 5-fold cross-validation, accuracy of sliding window, HMM, and CRF are 0.63, 0.79, and 0.86 respectively. This result shows that CRF achieves higher performance as a dance motion primitive recognizer than HMM as proposed by [1]. The CRF model achieves 0.88 of accuracy when motion feature is all skeleton joint angular coordinates as proposed by [2] but increases to 0.93 if the motion feature is only upper-body joint coordinates. Stochastic error-correcting context-free grammar is chosen as dance choreography model. The experiment using synthetic sequence label with cost factor ci=1 and error-sequence labels up to 50 percent shows the grammar can tolerate the input label sequence error up to 25 percent. The experiment using Pendet dance performances show that the average dance score is 79.3. The low dance score is due to several factors including: dance skill variation, unstable basic gesture repetition, high cost contributed by replacing deletion and substitution of local error by insertion operation, duration variation due the absence of timing guideline of body part motions, and limited training dataset to capture possible basic gesture variations.","PeriodicalId":145771,"journal":{"name":"2014 IEEE 3rd Global Conference on Consumer Electronics (GCCE)","volume":"130 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"A method for dance motion recognition and scoring using two-layer classifier based on conditional random field and stochastic error-correcting context-free grammar\",\"authors\":\"Y. Heryadi, M. I. Fanany, A. M. Arymurthy\",\"doi\":\"10.1109/GCCE.2014.7031294\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a unified framework for recognizing and scoring dance motion using 2-layer classifier so that computation complexity is distributed into two layers. This research examines the performance of sliding window, hidden Markov Model (HMM) and conditional random field (CRF) as the first layer classifier to segment the input video into a sequence of motion primitive label. The second layer classifier is stochastic error-correcting context-free grammar, built based on dance master knowledge, to parse the sequence of labels, builds a parse tree, and computes the accumulated dance score. The dataset for this research is captured using one Kinect camera. The training dataset is: 212 samples of 12 motion primitive samples and seven videos of Pendet dance performance. From 5-fold cross-validation, accuracy of sliding window, HMM, and CRF are 0.63, 0.79, and 0.86 respectively. This result shows that CRF achieves higher performance as a dance motion primitive recognizer than HMM as proposed by [1]. The CRF model achieves 0.88 of accuracy when motion feature is all skeleton joint angular coordinates as proposed by [2] but increases to 0.93 if the motion feature is only upper-body joint coordinates. Stochastic error-correcting context-free grammar is chosen as dance choreography model. The experiment using synthetic sequence label with cost factor ci=1 and error-sequence labels up to 50 percent shows the grammar can tolerate the input label sequence error up to 25 percent. The experiment using Pendet dance performances show that the average dance score is 79.3. The low dance score is due to several factors including: dance skill variation, unstable basic gesture repetition, high cost contributed by replacing deletion and substitution of local error by insertion operation, duration variation due the absence of timing guideline of body part motions, and limited training dataset to capture possible basic gesture variations.\",\"PeriodicalId\":145771,\"journal\":{\"name\":\"2014 IEEE 3rd Global Conference on Consumer Electronics (GCCE)\",\"volume\":\"130 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE 3rd Global Conference on Consumer Electronics (GCCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GCCE.2014.7031294\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 3rd Global Conference on Consumer Electronics (GCCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GCCE.2014.7031294","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

本文提出了一种基于二层分类器的舞蹈动作识别与评分的统一框架，将计算复杂度分散到两层。本研究考察了滑动窗口、隐马尔可夫模型(HMM)和条件随机场(CRF)作为第一层分类器将输入视频分割成一系列运动原语标签的性能。第二层分类器是基于舞蹈掌握知识构建的随机纠错上下文无关语法，对标签序列进行解析，构建解析树，计算舞蹈累计得分。本研究的数据集是用一台Kinect摄像头采集的。训练数据集为:212个样本，12个动作原始样本和7个Pendet舞蹈表演视频。通过5倍交叉验证，滑动窗口、HMM和CRF的准确率分别为0.63、0.79和0.86。结果表明，作为舞蹈动作原语识别器，CRF比[1]提出的HMM具有更高的性能。根据[2]提出的运动特征为全部骨骼关节角坐标时，CRF模型的精度达到0.88，而仅为上肢关节坐标时，CRF模型的精度提高到0.93。选择随机纠错上下文无关语法作为舞蹈编舞模型。使用成本因子ci=1的合成序列标签和高达50%的错误序列标签的实验表明，该语法可以容忍高达25%的输入标签序列错误。使用Pendet舞蹈表演的实验表明，舞蹈平均得分为79.3分。舞蹈得分低的原因包括:舞蹈技巧的变化、基本手势重复的不稳定、插入操作替代局部错误的删除和替代带来的高成本、身体部位运动缺乏定时指南导致的持续时间变化、以及捕捉可能的基本手势变化的训练数据有限。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A method for dance motion recognition and scoring using two-layer classifier based on conditional random field and stochastic error-correcting context-free grammar

This paper presents a unified framework for recognizing and scoring dance motion using 2-layer classifier so that computation complexity is distributed into two layers. This research examines the performance of sliding window, hidden Markov Model (HMM) and conditional random field (CRF) as the first layer classifier to segment the input video into a sequence of motion primitive label. The second layer classifier is stochastic error-correcting context-free grammar, built based on dance master knowledge, to parse the sequence of labels, builds a parse tree, and computes the accumulated dance score. The dataset for this research is captured using one Kinect camera. The training dataset is: 212 samples of 12 motion primitive samples and seven videos of Pendet dance performance. From 5-fold cross-validation, accuracy of sliding window, HMM, and CRF are 0.63, 0.79, and 0.86 respectively. This result shows that CRF achieves higher performance as a dance motion primitive recognizer than HMM as proposed by [1]. The CRF model achieves 0.88 of accuracy when motion feature is all skeleton joint angular coordinates as proposed by [2] but increases to 0.93 if the motion feature is only upper-body joint coordinates. Stochastic error-correcting context-free grammar is chosen as dance choreography model. The experiment using synthetic sequence label with cost factor ci=1 and error-sequence labels up to 50 percent shows the grammar can tolerate the input label sequence error up to 25 percent. The experiment using Pendet dance performances show that the average dance score is 79.3. The low dance score is due to several factors including: dance skill variation, unstable basic gesture repetition, high cost contributed by replacing deletion and substitution of local error by insertion operation, duration variation due the absence of timing guideline of body part motions, and limited training dataset to capture possible basic gesture variations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 IEEE 3rd Global Conference on Consumer Electronics (GCCE)

自引率

0.00%

发文量