Towards audio-video based handwritten mathematical content recognition in classroom videos

Proceedings of 2011 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing Pub Date : 2011-10-03 DOI:10.1109/PACRIM.2011.6032992

Smita Vemulapalli, M. Hayes

{"title":"Towards audio-video based handwritten mathematical content recognition in classroom videos","authors":"Smita Vemulapalli, M. Hayes","doi":"10.1109/PACRIM.2011.6032992","DOIUrl":null,"url":null,"abstract":"Recognizing handwritten mathematical content in classroom videos poses a range of interesting challenges. In this paper, we focus on improving the character recognition accuracy in such videos using a combination of video and audio based text recognizers. We propose a two step assembly consisting of a video text recognizer (VTR) as the primary character recognizer and an audio text recognizer (ATR) for disambiguating, if needed, the output of the VTR. We propose techniques for (1) detecting ambiguity in the output of the VTR so that a combination with the ATR may be triggered only for ambiguous characters, (2) synchronizing the output of the two recognizers for enabling combination, and (3) combining the options generated by the two recognizers using measurement and rank based methods. We have implemented the system using an open source implementation of a character recognizer and a commercially available phonetic word-spotter. Through experiments conducted using video recorded in a classroom-like environment, we demonstrate the improvement in the character recognition accuracy that can be achieved using our approach.","PeriodicalId":236844,"journal":{"name":"Proceedings of 2011 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 2011 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACRIM.2011.6032992","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Recognizing handwritten mathematical content in classroom videos poses a range of interesting challenges. In this paper, we focus on improving the character recognition accuracy in such videos using a combination of video and audio based text recognizers. We propose a two step assembly consisting of a video text recognizer (VTR) as the primary character recognizer and an audio text recognizer (ATR) for disambiguating, if needed, the output of the VTR. We propose techniques for (1) detecting ambiguity in the output of the VTR so that a combination with the ATR may be triggered only for ambiguous characters, (2) synchronizing the output of the two recognizers for enabling combination, and (3) combining the options generated by the two recognizers using measurement and rank based methods. We have implemented the system using an open source implementation of a character recognizer and a commercially available phonetic word-spotter. Through experiments conducted using video recorded in a classroom-like environment, we demonstrate the improvement in the character recognition accuracy that can be achieved using our approach.

查看原文本刊更多论文

基于音视频的课堂视频手写数学内容识别

识别课堂视频中手写的数学内容带来了一系列有趣的挑战。在本文中，我们着重于使用基于视频和音频的文本识别器组合来提高此类视频中的字符识别精度。我们提出了一个由视频文本识别器(VTR)作为主要字符识别器和音频文本识别器(ATR)组成的两步组装，用于消除歧义，如果需要，输出的VTR。我们提出了以下技术:(1)检测VTR输出中的歧义，以便仅对歧义字符触发与ATR的组合;(2)同步两个识别器的输出以启用组合;(3)使用测量和基于秩的方法组合两个识别器生成的选项。我们使用字符识别器的开源实现和商业上可用的语音单词识别器实现了该系统。通过在类似教室的环境中录制的视频进行实验，我们证明了使用我们的方法可以提高字符识别的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of 2011 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing

自引率

0.00%

发文量