Robust Text Detection in Natural Scene Images by Generalized Color-Enhanced Contrasting Extremal Region and Neural Networks

Lei Sun, Qiang Huo, Wei Jia, Kai Chen
{"title":"Robust Text Detection in Natural Scene Images by Generalized Color-Enhanced Contrasting Extremal Region and Neural Networks","authors":"Lei Sun, Qiang Huo, Wei Jia, Kai Chen","doi":"10.1109/ICPR.2014.469","DOIUrl":null,"url":null,"abstract":"This paper presents a robust text detection approach based on generalized color-enhanced contrasting extremal region (CER) and neural networks. Given a color natural scene image, six component-trees are built from its gray scale image, hue and saturation channel images in a perception-based illumination invariant color space, and their inverted images, respectively. From each component-tree, generalized color-enhanced CERs are extracted as character candidates. By using a \"divide-and-conquer\" strategy, each candidate image patch is labeled reliably by rules as one of five types, namely, Long, Thin, Fill, Square-large and Square-small, and classified as text or non-text by a corresponding neural network, which is trained by an ambiguity-free learning strategy. After pruning non-text components, repeating components in each component-tree are pruned by using color and area information to obtain a component graph, from which candidate text-lines are formed and verified by another set of neural networks. Finally, results from six component-trees are combined, and a post-processing step is used to recover lost characters and split text lines into words as appropriate. Our proposed method achieves 85.72% recall, 87.03% precision, and 86.37% F-score on ICDAR-2013 \"Reading Text in Scene Images\" test set.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 22nd International Conference on Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPR.2014.469","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 43

Abstract

This paper presents a robust text detection approach based on generalized color-enhanced contrasting extremal region (CER) and neural networks. Given a color natural scene image, six component-trees are built from its gray scale image, hue and saturation channel images in a perception-based illumination invariant color space, and their inverted images, respectively. From each component-tree, generalized color-enhanced CERs are extracted as character candidates. By using a "divide-and-conquer" strategy, each candidate image patch is labeled reliably by rules as one of five types, namely, Long, Thin, Fill, Square-large and Square-small, and classified as text or non-text by a corresponding neural network, which is trained by an ambiguity-free learning strategy. After pruning non-text components, repeating components in each component-tree are pruned by using color and area information to obtain a component graph, from which candidate text-lines are formed and verified by another set of neural networks. Finally, results from six component-trees are combined, and a post-processing step is used to recover lost characters and split text lines into words as appropriate. Our proposed method achieves 85.72% recall, 87.03% precision, and 86.37% F-score on ICDAR-2013 "Reading Text in Scene Images" test set.
基于广义颜色增强对比极值区域和神经网络的自然场景图像鲁棒文本检测
提出了一种基于广义颜色增强对比极值区域(CER)和神经网络的鲁棒文本检测方法。给定一幅彩色自然场景图像,分别从其灰度图像、基于感知的照明不变色彩空间中的色调和饱和度通道图像及其倒立图像构建6个分量树。从每个组件树中提取广义颜色增强cer作为候选字符。通过“分而治之”策略,每个候选图像patch被规则可靠地标记为Long, Thin, Fill, Square-large和Square-small五种类型之一,并由相应的神经网络分类为文本或非文本,该神经网络通过无歧义学习策略进行训练。在对非文本成分进行剪枝后,利用颜色和面积信息对每个成分树中的重复成分进行剪枝,得到成分图,形成候选文本行,并由另一组神经网络进行验证。最后,对来自六个组件树的结果进行组合,并使用后处理步骤来恢复丢失的字符,并根据需要将文本行拆分为单词。该方法在ICDAR-2013“场景图像中阅读文本”测试集上达到了85.72%的召回率、87.03%的准确率和86.37%的f分。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信