Optical Character Recognition for Audio-Visual Broadcast Transcription System

2020 11th IEEE International Conference on Cognitive Infocommunications (CogInfoCom) Pub Date : 2020-09-23 DOI:10.1109/CogInfoCom50765.2020.9237867

J. Chaloupka, K. Paleček, P. Cerva, J. Zdánský

引用次数: 2

Abstract

This paper investigates the use of optical character recognition (OCR) for system of audio-visual broadcast transcription. Characters were recognized from video frames by open-source program OCR Tesseract. The OCR in this program (from version 4) is based on Recurrent Neural Networks (RNN) and it uses text post-processing by bigram language model. However, the resulting recognized text contains a number of errors. In some images, the text is not detected and recognized correctly or it is not detected at all. We have designed and tested image pre-processing and text post-processing methods for OCR error reduction. The word error rate (WER) was reduced from 29,4% to 15,4%.

查看原文本刊更多论文

用于视听广播转录系统的光学字符识别

本文研究了光学字符识别(OCR)在视听广播转录系统中的应用。字符被开源程序OCR Tesseract从视频帧中识别出来。本程序(版本4)中的OCR基于递归神经网络(RNN)，并使用双元语言模型进行文本后处理。然而，得到的可识别文本包含许多错误。在一些图像中，文本没有被正确检测和识别，或者根本没有被检测到。我们设计并测试了图像预处理和文本后处理方法，以减少OCR误差。单词错误率(WER)由29.4%降至15.4%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 11th IEEE International Conference on Cognitive Infocommunications (CogInfoCom)

自引率

0.00%

发文量