基于MSER和卷积神经网络的自然场景图像文本区域检测与识别

CompSciRN: Audio Pub Date : 2020-11-21 DOI:10.2139/ssrn.3734809

A. V, S. M, T. V

{"title":"基于MSER和卷积神经网络的自然场景图像文本区域检测与识别","authors":"A. V, S. M, T. V","doi":"10.2139/ssrn.3734809","DOIUrl":null,"url":null,"abstract":"Text detection and recognition in natural scene images is a computer vision problem that remained a challenge for computer engineers for quite a long time. The new advancements in deep learning have revolutionized the world of computer vision. This paper attempts to build a Deep Learning (DL) based Text detection and recognition model for interpreting the text in natural scene images. The proposed model consists of three stages namely candidate text region detection, text region extraction, and text recognition. The natural scene image is first fed to the candidate text region detection mechanism which extracts potential regions containing text characters. The regions containing non-text which are introduced in the first stage of processing are filtered in the second stage. The set of text regions resulted from the second stage is then recognized in the final stage. Maximally Stable Extremal Region (MSER) algorithm is used in the candidate text region detection. Two convolutional neural networks, one in the text region extraction stage and the other one in the text recognition stage, are used in the proposed model. Text detection in natural scenes is not an easy problem as it appears. The complexity of detection and recognition of text characters in natural scene images is mainly due to the diversity of the textual characters and the natural scene, presence of various disturbances, different illumination conditions, different color, size, and area of the text. ICDAR-2011, ICDAR-2013, CHARS-74K, and CIFAR-100 datasets are used for training and validating our models.","PeriodicalId":145147,"journal":{"name":"CompSciRN: Audio","volume":"16 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Text Region Detection and Recognition in Natural Scene Images Using MSER and Convolutional Neural Network\",\"authors\":\"A. V, S. M, T. V\",\"doi\":\"10.2139/ssrn.3734809\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text detection and recognition in natural scene images is a computer vision problem that remained a challenge for computer engineers for quite a long time. The new advancements in deep learning have revolutionized the world of computer vision. This paper attempts to build a Deep Learning (DL) based Text detection and recognition model for interpreting the text in natural scene images. The proposed model consists of three stages namely candidate text region detection, text region extraction, and text recognition. The natural scene image is first fed to the candidate text region detection mechanism which extracts potential regions containing text characters. The regions containing non-text which are introduced in the first stage of processing are filtered in the second stage. The set of text regions resulted from the second stage is then recognized in the final stage. Maximally Stable Extremal Region (MSER) algorithm is used in the candidate text region detection. Two convolutional neural networks, one in the text region extraction stage and the other one in the text recognition stage, are used in the proposed model. Text detection in natural scenes is not an easy problem as it appears. The complexity of detection and recognition of text characters in natural scene images is mainly due to the diversity of the textual characters and the natural scene, presence of various disturbances, different illumination conditions, different color, size, and area of the text. ICDAR-2011, ICDAR-2013, CHARS-74K, and CIFAR-100 datasets are used for training and validating our models.\",\"PeriodicalId\":145147,\"journal\":{\"name\":\"CompSciRN: Audio\",\"volume\":\"16 2\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"CompSciRN: Audio\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.3734809\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"CompSciRN: Audio","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3734809","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

自然场景图像中的文本检测与识别是一个计算机视觉问题，长期以来一直困扰着计算机工程师。深度学习的新进展彻底改变了计算机视觉领域。本文试图建立一个基于深度学习(DL)的文本检测和识别模型来解释自然场景图像中的文本。该模型包括候选文本区域检测、文本区域提取和文本识别三个阶段。首先将自然场景图像输入候选文本区域检测机制，提取包含文本字符的潜在区域。在第一阶段处理中引入的包含非文本的区域在第二阶段进行过滤。然后在最后阶段识别第二阶段产生的文本区域集。候选文本区域检测采用最大稳定极值区域(MSER)算法。该模型使用了两个卷积神经网络，一个用于文本区域提取阶段，另一个用于文本识别阶段。自然场景中的文本检测从一开始就不是一个简单的问题。自然场景图像中文本字符检测和识别的复杂性主要是由于文本字符和自然场景的多样性、各种干扰的存在、光照条件的不同、文本的颜色、大小和面积的不同。ICDAR-2011、ICDAR-2013、CHARS-74K和CIFAR-100数据集用于训练和验证我们的模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Text Region Detection and Recognition in Natural Scene Images Using MSER and Convolutional Neural Network

Text detection and recognition in natural scene images is a computer vision problem that remained a challenge for computer engineers for quite a long time. The new advancements in deep learning have revolutionized the world of computer vision. This paper attempts to build a Deep Learning (DL) based Text detection and recognition model for interpreting the text in natural scene images. The proposed model consists of three stages namely candidate text region detection, text region extraction, and text recognition. The natural scene image is first fed to the candidate text region detection mechanism which extracts potential regions containing text characters. The regions containing non-text which are introduced in the first stage of processing are filtered in the second stage. The set of text regions resulted from the second stage is then recognized in the final stage. Maximally Stable Extremal Region (MSER) algorithm is used in the candidate text region detection. Two convolutional neural networks, one in the text region extraction stage and the other one in the text recognition stage, are used in the proposed model. Text detection in natural scenes is not an easy problem as it appears. The complexity of detection and recognition of text characters in natural scene images is mainly due to the diversity of the textual characters and the natural scene, presence of various disturbances, different illumination conditions, different color, size, and area of the text. ICDAR-2011, ICDAR-2013, CHARS-74K, and CIFAR-100 datasets are used for training and validating our models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

CompSciRN: Audio

自引率

0.00%

发文量