基于姿态信息的演讲者动作分类的演讲视频内容提取

Fei Xu, Kenny Davila, S. Setlur, V. Govindaraju
{"title":"基于姿态信息的演讲者动作分类的演讲视频内容提取","authors":"Fei Xu, Kenny Davila, S. Setlur, V. Govindaraju","doi":"10.1109/ICDAR.2019.00171","DOIUrl":null,"url":null,"abstract":"Online lecture videos are increasingly important e-learning materials for students. Automated content extraction from lecture videos facilitates information retrieval applications that improve access to the lecture material. A significant number of lecture videos include the speaker in the image. Speakers perform various semantically meaningful actions during the process of teaching. Among all the movements of the speaker, key actions such as writing or erasing potentially indicate important features directly related to the lecture content. In this paper, we present a methodology for lecture video content extraction using the speaker actions. Each lecture video is divided into small temporal units called action segments. Using a pose estimator, body and hands skeleton data are extracted and used to compute motion-based features describing each action segment. Then, the dominant speaker action of each of these segments is classified using Random forests and the motion-based features. With the temporal and spatial range of these actions, we implement an alternative way to draw key-frames of handwritten content from the video. In addition, for our fixed camera videos, we also use the skeleton data to compute a mask of the speaker writing locations for the subtraction of the background noise from the binarized key-frames. Our method has been tested on a publicly available lecture video dataset, and it shows reasonable recall and precision results, with a very good compression ratio which is better than previous methods based on content analysis.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Content Extraction from Lecture Video via Speaker Action Classification Based on Pose Information\",\"authors\":\"Fei Xu, Kenny Davila, S. Setlur, V. Govindaraju\",\"doi\":\"10.1109/ICDAR.2019.00171\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Online lecture videos are increasingly important e-learning materials for students. Automated content extraction from lecture videos facilitates information retrieval applications that improve access to the lecture material. A significant number of lecture videos include the speaker in the image. Speakers perform various semantically meaningful actions during the process of teaching. Among all the movements of the speaker, key actions such as writing or erasing potentially indicate important features directly related to the lecture content. In this paper, we present a methodology for lecture video content extraction using the speaker actions. Each lecture video is divided into small temporal units called action segments. Using a pose estimator, body and hands skeleton data are extracted and used to compute motion-based features describing each action segment. Then, the dominant speaker action of each of these segments is classified using Random forests and the motion-based features. With the temporal and spatial range of these actions, we implement an alternative way to draw key-frames of handwritten content from the video. In addition, for our fixed camera videos, we also use the skeleton data to compute a mask of the speaker writing locations for the subtraction of the background noise from the binarized key-frames. Our method has been tested on a publicly available lecture video dataset, and it shows reasonable recall and precision results, with a very good compression ratio which is better than previous methods based on content analysis.\",\"PeriodicalId\":325437,\"journal\":{\"name\":\"2019 International Conference on Document Analysis and Recognition (ICDAR)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Document Analysis and Recognition (ICDAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.2019.00171\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2019.00171","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

在线讲座视频是学生越来越重要的电子学习材料。从讲座视频中自动提取内容有助于信息检索应用程序,从而改善对讲座材料的访问。很多讲座视频的图像中都有讲者。在教学过程中,说话者会做出各种有语义意义的动作。在演讲者的所有动作中,关键动作,如写或擦除,可能表明与演讲内容直接相关的重要特征。在本文中,我们提出了一种使用演讲者动作提取讲座视频内容的方法。每个讲座视频被分成小的时间单元,称为动作片段。使用姿态估计器,提取身体和手部骨骼数据并用于计算描述每个动作段的基于运动的特征。然后,使用随机森林和基于运动的特征对每个片段的主要说话人动作进行分类。利用这些动作的时间和空间范围,我们实现了一种从视频中绘制手写内容的关键帧的替代方法。此外,对于我们的固定摄像机视频,我们还使用骨架数据来计算扬声器写入位置的掩码,以便从二值化的关键帧中减去背景噪声。我们的方法已经在一个公开的讲座视频数据集上进行了测试,显示出合理的查全率和查准率结果,压缩比非常好,优于以往基于内容分析的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Content Extraction from Lecture Video via Speaker Action Classification Based on Pose Information
Online lecture videos are increasingly important e-learning materials for students. Automated content extraction from lecture videos facilitates information retrieval applications that improve access to the lecture material. A significant number of lecture videos include the speaker in the image. Speakers perform various semantically meaningful actions during the process of teaching. Among all the movements of the speaker, key actions such as writing or erasing potentially indicate important features directly related to the lecture content. In this paper, we present a methodology for lecture video content extraction using the speaker actions. Each lecture video is divided into small temporal units called action segments. Using a pose estimator, body and hands skeleton data are extracted and used to compute motion-based features describing each action segment. Then, the dominant speaker action of each of these segments is classified using Random forests and the motion-based features. With the temporal and spatial range of these actions, we implement an alternative way to draw key-frames of handwritten content from the video. In addition, for our fixed camera videos, we also use the skeleton data to compute a mask of the speaker writing locations for the subtraction of the background noise from the binarized key-frames. Our method has been tested on a publicly available lecture video dataset, and it shows reasonable recall and precision results, with a very good compression ratio which is better than previous methods based on content analysis.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信