用于视听广播转录系统的光学字符识别

2020 11th IEEE International Conference on Cognitive Infocommunications (CogInfoCom) Pub Date : 2020-09-23 DOI:10.1109/CogInfoCom50765.2020.9237867

J. Chaloupka, K. Paleček, P. Cerva, J. Zdánský

{"title":"用于视听广播转录系统的光学字符识别","authors":"J. Chaloupka, K. Paleček, P. Cerva, J. Zdánský","doi":"10.1109/CogInfoCom50765.2020.9237867","DOIUrl":null,"url":null,"abstract":"This paper investigates the use of optical character recognition (OCR) for system of audio-visual broadcast transcription. Characters were recognized from video frames by open-source program OCR Tesseract. The OCR in this program (from version 4) is based on Recurrent Neural Networks (RNN) and it uses text post-processing by bigram language model. However, the resulting recognized text contains a number of errors. In some images, the text is not detected and recognized correctly or it is not detected at all. We have designed and tested image pre-processing and text post-processing methods for OCR error reduction. The word error rate (WER) was reduced from 29,4% to 15,4%.","PeriodicalId":236400,"journal":{"name":"2020 11th IEEE International Conference on Cognitive Infocommunications (CogInfoCom)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Optical Character Recognition for Audio-Visual Broadcast Transcription System\",\"authors\":\"J. Chaloupka, K. Paleček, P. Cerva, J. Zdánský\",\"doi\":\"10.1109/CogInfoCom50765.2020.9237867\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper investigates the use of optical character recognition (OCR) for system of audio-visual broadcast transcription. Characters were recognized from video frames by open-source program OCR Tesseract. The OCR in this program (from version 4) is based on Recurrent Neural Networks (RNN) and it uses text post-processing by bigram language model. However, the resulting recognized text contains a number of errors. In some images, the text is not detected and recognized correctly or it is not detected at all. We have designed and tested image pre-processing and text post-processing methods for OCR error reduction. The word error rate (WER) was reduced from 29,4% to 15,4%.\",\"PeriodicalId\":236400,\"journal\":{\"name\":\"2020 11th IEEE International Conference on Cognitive Infocommunications (CogInfoCom)\",\"volume\":\"56 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 11th IEEE International Conference on Cognitive Infocommunications (CogInfoCom)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CogInfoCom50765.2020.9237867\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 11th IEEE International Conference on Cognitive Infocommunications (CogInfoCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CogInfoCom50765.2020.9237867","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

本文研究了光学字符识别(OCR)在视听广播转录系统中的应用。字符被开源程序OCR Tesseract从视频帧中识别出来。本程序(版本4)中的OCR基于递归神经网络(RNN)，并使用双元语言模型进行文本后处理。然而，得到的可识别文本包含许多错误。在一些图像中，文本没有被正确检测和识别，或者根本没有被检测到。我们设计并测试了图像预处理和文本后处理方法，以减少OCR误差。单词错误率(WER)由29.4%降至15.4%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Optical Character Recognition for Audio-Visual Broadcast Transcription System

This paper investigates the use of optical character recognition (OCR) for system of audio-visual broadcast transcription. Characters were recognized from video frames by open-source program OCR Tesseract. The OCR in this program (from version 4) is based on Recurrent Neural Networks (RNN) and it uses text post-processing by bigram language model. However, the resulting recognized text contains a number of errors. In some images, the text is not detected and recognized correctly or it is not detected at all. We have designed and tested image pre-processing and text post-processing methods for OCR error reduction. The word error rate (WER) was reduced from 29,4% to 15,4%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 11th IEEE International Conference on Cognitive Infocommunications (CogInfoCom)

自引率

0.00%

发文量