Text Detection and Recognition in Real World Images

2012 International Conference on Frontiers in Handwriting Recognition Pub Date : 2012-09-18 DOI:10.1109/ICFHR.2012.279

Raid Saabni, M. Zwilling

{"title":"Text Detection and Recognition in Real World Images","authors":"Raid Saabni, M. Zwilling","doi":"10.1109/ICFHR.2012.279","DOIUrl":null,"url":null,"abstract":"Detecting and recognizing texts in real world images such as sign boards and advertisements is an important part of computer vision applications. The complexity of the problem comes out of many factors such as nonuniform background, different languages and fonts, and non consistent text alignment and orientation. In this paper, we present a novel approach to detect characters and words in real-world images. The presented approach decompose the gray level image into sequence of images, each one includes pixels with gray level values from different disjoint ranges. This decomposition enables extracting connected components representing characters or other non textual objects separated from their neighborhood background. An interpolation of two classes of features translated to histograms is used by a support vector machine to classify and collect the textual objects generating the textual zones. The Shape Context Descriptor [1], is used by the Earth Movers Distance(EMD) method to recognize the characters within the image. The recognized characters are fed to heuristic rule based system to determine words and give final results. To optimize the speed of the system, we follow the embedding of the EMD metric presented in [22] to a normed space to enable fast approximation of the k-Nearest Neighbors using Local Sensitivity Hashing functions(LSH). Experiments show that our algorithm can detect and recognize text regions from the ICDAR 2005 datasets [17] with high rates.","PeriodicalId":291062,"journal":{"name":"2012 International Conference on Frontiers in Handwriting Recognition","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Frontiers in Handwriting Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFHR.2012.279","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Detecting and recognizing texts in real world images such as sign boards and advertisements is an important part of computer vision applications. The complexity of the problem comes out of many factors such as nonuniform background, different languages and fonts, and non consistent text alignment and orientation. In this paper, we present a novel approach to detect characters and words in real-world images. The presented approach decompose the gray level image into sequence of images, each one includes pixels with gray level values from different disjoint ranges. This decomposition enables extracting connected components representing characters or other non textual objects separated from their neighborhood background. An interpolation of two classes of features translated to histograms is used by a support vector machine to classify and collect the textual objects generating the textual zones. The Shape Context Descriptor [1], is used by the Earth Movers Distance(EMD) method to recognize the characters within the image. The recognized characters are fed to heuristic rule based system to determine words and give final results. To optimize the speed of the system, we follow the embedding of the EMD metric presented in [22] to a normed space to enable fast approximation of the k-Nearest Neighbors using Local Sensitivity Hashing functions(LSH). Experiments show that our algorithm can detect and recognize text regions from the ICDAR 2005 datasets [17] with high rates.

查看原文本刊更多论文

真实世界图像中的文本检测与识别

在广告牌和广告等现实图像中检测和识别文本是计算机视觉应用的重要组成部分。问题的复杂性来自于背景不统一、语言和字体不同、文本对齐和方向不一致等诸多因素。在本文中，我们提出了一种新的方法来检测真实世界图像中的字符和单词。该方法将灰度图像分解为图像序列，每个图像序列包含来自不同不相交范围的灰度值像素。这种分解可以从邻近的背景中提取表示字符或其他非文本对象的连接组件。支持向量机将两类特征转换成直方图进行插值，对生成文本区域的文本对象进行分类和收集。形状上下文描述符[1]被大地移动距离(EMD)方法用于识别图像中的字符。将识别出的字符输入到启发式规则系统中进行单词的确定，并给出最终结果。为了优化系统的速度，我们将[22]中提出的EMD度量嵌入到赋范空间中，以便使用局部灵敏度哈希函数(LSH)快速逼近k-近邻。实验表明，我们的算法能够以较高的速率检测和识别ICDAR 2005数据集[17]中的文本区域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 International Conference on Frontiers in Handwriting Recognition

自引率

0.00%

发文量