Separation of Foreground Text from Complex Background in Color Document Images

S. Nirmala, P. Nagabhushan
{"title":"Separation of Foreground Text from Complex Background in Color Document Images","authors":"S. Nirmala, P. Nagabhushan","doi":"10.1109/ICAPR.2009.26","DOIUrl":null,"url":null,"abstract":"Reading of the foreground text is difficult in documents having multi colored complex background. Automatic foreground text separation in such document images is very much essential for smooth reading of the document contents. In this paper we propose a hybrid approach which combines connected component analysis and an unsupervised thresholding for separation of text from the complex background. The proposed approach identifies the candidate text regions based on edge detection followed by a connected component analysis. Because of background complexity it is also possible that a non text region may be identified as a text region. To overcome this problem we extract texture features of connected components and analyze the feature values. Finally the threshold value for each detected text region is derived automatically from the data of corresponding image region to perform foreground separation. The proposed approach can handle document images with varying background of multiple colors. Also it can handle foreground text of any color, font and size. Experimental results show that the proposed algorithm detects on an average 97.8% of text regions in the source document. Readability of the extracted foreground text is illustrated through OCRing.","PeriodicalId":443926,"journal":{"name":"2009 Seventh International Conference on Advances in Pattern Recognition","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Seventh International Conference on Advances in Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAPR.2009.26","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23

Abstract

Reading of the foreground text is difficult in documents having multi colored complex background. Automatic foreground text separation in such document images is very much essential for smooth reading of the document contents. In this paper we propose a hybrid approach which combines connected component analysis and an unsupervised thresholding for separation of text from the complex background. The proposed approach identifies the candidate text regions based on edge detection followed by a connected component analysis. Because of background complexity it is also possible that a non text region may be identified as a text region. To overcome this problem we extract texture features of connected components and analyze the feature values. Finally the threshold value for each detected text region is derived automatically from the data of corresponding image region to perform foreground separation. The proposed approach can handle document images with varying background of multiple colors. Also it can handle foreground text of any color, font and size. Experimental results show that the proposed algorithm detects on an average 97.8% of text regions in the source document. Readability of the extracted foreground text is illustrated through OCRing.
彩色文档图像中前景文本与复杂背景的分离
在具有多色复杂背景的文档中,前景文本的阅读是困难的。在这样的文档图像中,自动前景文本分离对于顺利阅读文档内容是非常必要的。在本文中,我们提出了一种结合连接成分分析和无监督阈值的混合方法,用于从复杂背景中分离文本。该方法基于边缘检测和连通成分分析来识别候选文本区域。由于背景的复杂性,非文本区域也可能被识别为文本区域。为了解决这个问题,我们提取了连接组件的纹理特征,并对特征值进行了分析。最后从相应图像区域的数据中自动导出每个检测到的文本区域的阈值,进行前景分离。该方法可以处理具有多种颜色背景的文档图像。它还可以处理任何颜色、字体和大小的前景文本。实验结果表明,该算法平均检测出源文档中97.8%的文本区域。通过OCRing说明了提取的前景文本的可读性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信