基于自然语言处理的视频文本识别综合神经方法

Proceedings of the 1st ACM International Conference on Multimedia Retrieval Pub Date : 2011-04-18 DOI:10.1145/1991996.1992019

Khaoula Elagouni, Christophe Garcia, P. Sébillot

{"title":"基于自然语言处理的视频文本识别综合神经方法","authors":"Khaoula Elagouni, Christophe Garcia, P. Sébillot","doi":"10.1145/1991996.1992019","DOIUrl":null,"url":null,"abstract":"This work aims at helping multimedia content understanding by deriving benefit from textual clues embedded in digital videos. For this, we developed a complete video Optical Character Recognition system (OCR), specifically adapted to detect and recognize embedded texts in videos. Based on a neural approach, this new method outperforms related work, especially in terms of robustness to style and size variabilities, to background complexity and to low resolution of the image. A language model that drives several steps of the video OCR is also introduced in order to remove ambiguities due to a local letter by letter recognition and to reduce segmentation errors. This approach has been evaluated on a database of French TV news videos and achieves an outstanding character recognition rate of 95%, corresponding to 78% of words correctly recognized, which enables its incorporation into an automatic video indexing and retrieval system.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"A comprehensive neural-based approach for text recognition in videos using natural language processing\",\"authors\":\"Khaoula Elagouni, Christophe Garcia, P. Sébillot\",\"doi\":\"10.1145/1991996.1992019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This work aims at helping multimedia content understanding by deriving benefit from textual clues embedded in digital videos. For this, we developed a complete video Optical Character Recognition system (OCR), specifically adapted to detect and recognize embedded texts in videos. Based on a neural approach, this new method outperforms related work, especially in terms of robustness to style and size variabilities, to background complexity and to low resolution of the image. A language model that drives several steps of the video OCR is also introduced in order to remove ambiguities due to a local letter by letter recognition and to reduce segmentation errors. This approach has been evaluated on a database of French TV news videos and achieves an outstanding character recognition rate of 95%, corresponding to 78% of words correctly recognized, which enables its incorporation into an automatic video indexing and retrieval system.\",\"PeriodicalId\":390933,\"journal\":{\"name\":\"Proceedings of the 1st ACM International Conference on Multimedia Retrieval\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-04-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 1st ACM International Conference on Multimedia Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1991996.1992019\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1991996.1992019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

摘要

这项工作旨在通过从嵌入在数字视频中的文本线索中获取好处来帮助理解多媒体内容。为此，我们开发了一个完整的视频光学字符识别系统(OCR)，专门用于检测和识别视频中的嵌入文本。该方法基于神经网络方法，在对样式和大小变化、背景复杂性和低分辨率图像的鲁棒性方面优于相关工作。为了消除由于局部字母识别造成的歧义，并减少分割错误，还引入了驱动视频OCR几个步骤的语言模型。该方法在法国电视新闻视频数据库上进行了测试，字符识别率达到了95%，对应于78%的单词被正确识别，可用于视频自动索引检索系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A comprehensive neural-based approach for text recognition in videos using natural language processing

This work aims at helping multimedia content understanding by deriving benefit from textual clues embedded in digital videos. For this, we developed a complete video Optical Character Recognition system (OCR), specifically adapted to detect and recognize embedded texts in videos. Based on a neural approach, this new method outperforms related work, especially in terms of robustness to style and size variabilities, to background complexity and to low resolution of the image. A language model that drives several steps of the video OCR is also introduced in order to remove ambiguities due to a local letter by letter recognition and to reduce segmentation errors. This approach has been evaluated on a database of French TV news videos and achieves an outstanding character recognition rate of 95%, corresponding to 78% of words correctly recognized, which enables its incorporation into an automatic video indexing and retrieval system.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 1st ACM International Conference on Multimedia Retrieval

自引率

0.00%

发文量