Comparing Visual, Textual, and Multimodal Features for Detecting Sign Language in Video Sharing Sites

C. D. D. Monteiro, F. Shipman, R. Gutierrez-Osuna
{"title":"Comparing Visual, Textual, and Multimodal Features for Detecting Sign Language in Video Sharing Sites","authors":"C. D. D. Monteiro, F. Shipman, R. Gutierrez-Osuna","doi":"10.1109/MIPR.2018.00010","DOIUrl":null,"url":null,"abstract":"Easy recording and sharing of video content has led to the creation and distribution of increasing quantities of sign language (SL) content. Current capabilities make locating SL videos on a desired topic dependent on the existence and correctness of metadata indicating both the language and topic of the video. Automated techniques to detect sign language content can aid this problem. This paper compares metadata-based classifiers and multimodal classifiers, using both early and late fusion techniques, with video content-based classifiers in the literature. Comparisons of applying TF-IDF, LDA, and NMF in the generation of metadata features indicates that NMF performs best, either when used independently or when combined with video features. Multimodal classifiers perform better than unimodal SL video classifiers. Experiments show multimodal features obtained results of up to 86% precision, 81% recall, and 84% F1 score. This represents an improvement on F1 score of roughly 9% in comparison with the video-based approach presented in the literature and an improvement of 6% over text-based features extracted using NMF.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"206 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MIPR.2018.00010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Easy recording and sharing of video content has led to the creation and distribution of increasing quantities of sign language (SL) content. Current capabilities make locating SL videos on a desired topic dependent on the existence and correctness of metadata indicating both the language and topic of the video. Automated techniques to detect sign language content can aid this problem. This paper compares metadata-based classifiers and multimodal classifiers, using both early and late fusion techniques, with video content-based classifiers in the literature. Comparisons of applying TF-IDF, LDA, and NMF in the generation of metadata features indicates that NMF performs best, either when used independently or when combined with video features. Multimodal classifiers perform better than unimodal SL video classifiers. Experiments show multimodal features obtained results of up to 86% precision, 81% recall, and 84% F1 score. This represents an improvement on F1 score of roughly 9% in comparison with the video-based approach presented in the literature and an improvement of 6% over text-based features extracted using NMF.
比较视频分享网站中手语检测的视觉、文本和多模态特征
视频内容的轻松录制和共享导致了越来越多的手语内容的创建和分发。当前的功能使得定位所需主题上的SL视频取决于指示视频的语言和主题的元数据的存在和正确性。检测手语内容的自动化技术可以帮助解决这个问题。本文将基于元数据的分类器和使用早期和晚期融合技术的多模态分类器与文献中基于视频内容的分类器进行了比较。在生成元数据特征时使用TF-IDF、LDA和NMF的比较表明,无论是单独使用还是与视频特征结合使用,NMF都表现最好。多模态分类器比单模态视频分类器性能更好。实验表明,多模态特征获得了高达86%的准确率、81%的召回率和84%的F1分数。与文献中提出的基于视频的方法相比,这代表F1分数提高了大约9%,比使用NMF提取的基于文本的特征提高了6%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信