{"title":"Grammar-assisted audio-video equation recognition","authors":"Smita Vemulapalli, M. Hayes","doi":"10.1109/ICDSP.2013.6622671","DOIUrl":null,"url":null,"abstract":"In this paper, we consider the problem of recognizing handwritten mathematical content from classroom videos. Since the handwritten text and the accompanying audio refer to the same mathematical characters and symbols, a combination of video and audio based recognizers has the potential to significantly increase the recognition accuracy compared to that of the individual recognizers. In this paper, we propose a novel multi-step technique for combining the output of the video and the audio based recognizers. Initial recognition results from a video based recognizer and a speech recognizer, operating independently on the handwritten and the spoken content from a classroom video, are combined with a base mathematical speech grammar to arrive at a constrained speech grammar that is specific to the content being recognized. The constrained speech grammar is then used by the speech recognizer to generate the final character recognition results. A subsequent layout analysis step, which makes used of audio cues and X-Y cuts based method, is used to arrive at the final recognized content. Experiments conducted using videos recorded in a classroom like environment are used to demonstrate the significant improvement in recognition accuracy that can be achieved using our technique.","PeriodicalId":180360,"journal":{"name":"2013 18th International Conference on Digital Signal Processing (DSP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 18th International Conference on Digital Signal Processing (DSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDSP.2013.6622671","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this paper, we consider the problem of recognizing handwritten mathematical content from classroom videos. Since the handwritten text and the accompanying audio refer to the same mathematical characters and symbols, a combination of video and audio based recognizers has the potential to significantly increase the recognition accuracy compared to that of the individual recognizers. In this paper, we propose a novel multi-step technique for combining the output of the video and the audio based recognizers. Initial recognition results from a video based recognizer and a speech recognizer, operating independently on the handwritten and the spoken content from a classroom video, are combined with a base mathematical speech grammar to arrive at a constrained speech grammar that is specific to the content being recognized. The constrained speech grammar is then used by the speech recognizer to generate the final character recognition results. A subsequent layout analysis step, which makes used of audio cues and X-Y cuts based method, is used to arrive at the final recognized content. Experiments conducted using videos recorded in a classroom like environment are used to demonstrate the significant improvement in recognition accuracy that can be achieved using our technique.