Text Detection and Recognition in Real World Images

Raid Saabni, M. Zwilling
{"title":"Text Detection and Recognition in Real World Images","authors":"Raid Saabni, M. Zwilling","doi":"10.1109/ICFHR.2012.279","DOIUrl":null,"url":null,"abstract":"Detecting and recognizing texts in real world images such as sign boards and advertisements is an important part of computer vision applications. The complexity of the problem comes out of many factors such as nonuniform background, different languages and fonts, and non consistent text alignment and orientation. In this paper, we present a novel approach to detect characters and words in real-world images. The presented approach decompose the gray level image into sequence of images, each one includes pixels with gray level values from different disjoint ranges. This decomposition enables extracting connected components representing characters or other non textual objects separated from their neighborhood background. An interpolation of two classes of features translated to histograms is used by a support vector machine to classify and collect the textual objects generating the textual zones. The Shape Context Descriptor [1], is used by the Earth Movers Distance(EMD) method to recognize the characters within the image. The recognized characters are fed to heuristic rule based system to determine words and give final results. To optimize the speed of the system, we follow the embedding of the EMD metric presented in [22] to a normed space to enable fast approximation of the k-Nearest Neighbors using Local Sensitivity Hashing functions(LSH). Experiments show that our algorithm can detect and recognize text regions from the ICDAR 2005 datasets [17] with high rates.","PeriodicalId":291062,"journal":{"name":"2012 International Conference on Frontiers in Handwriting Recognition","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Frontiers in Handwriting Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFHR.2012.279","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Detecting and recognizing texts in real world images such as sign boards and advertisements is an important part of computer vision applications. The complexity of the problem comes out of many factors such as nonuniform background, different languages and fonts, and non consistent text alignment and orientation. In this paper, we present a novel approach to detect characters and words in real-world images. The presented approach decompose the gray level image into sequence of images, each one includes pixels with gray level values from different disjoint ranges. This decomposition enables extracting connected components representing characters or other non textual objects separated from their neighborhood background. An interpolation of two classes of features translated to histograms is used by a support vector machine to classify and collect the textual objects generating the textual zones. The Shape Context Descriptor [1], is used by the Earth Movers Distance(EMD) method to recognize the characters within the image. The recognized characters are fed to heuristic rule based system to determine words and give final results. To optimize the speed of the system, we follow the embedding of the EMD metric presented in [22] to a normed space to enable fast approximation of the k-Nearest Neighbors using Local Sensitivity Hashing functions(LSH). Experiments show that our algorithm can detect and recognize text regions from the ICDAR 2005 datasets [17] with high rates.
真实世界图像中的文本检测与识别
在广告牌和广告等现实图像中检测和识别文本是计算机视觉应用的重要组成部分。问题的复杂性来自于背景不统一、语言和字体不同、文本对齐和方向不一致等诸多因素。在本文中,我们提出了一种新的方法来检测真实世界图像中的字符和单词。该方法将灰度图像分解为图像序列,每个图像序列包含来自不同不相交范围的灰度值像素。这种分解可以从邻近的背景中提取表示字符或其他非文本对象的连接组件。支持向量机将两类特征转换成直方图进行插值,对生成文本区域的文本对象进行分类和收集。形状上下文描述符[1]被大地移动距离(EMD)方法用于识别图像中的字符。将识别出的字符输入到启发式规则系统中进行单词的确定,并给出最终结果。为了优化系统的速度,我们将[22]中提出的EMD度量嵌入到赋范空间中,以便使用局部灵敏度哈希函数(LSH)快速逼近k-近邻。实验表明,我们的算法能够以较高的速率检测和识别ICDAR 2005数据集[17]中的文本区域。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信