Research on born-digital image text extraction based on conditional random field

Q4 Computer Science
Zhang Jian, Cheng Ren-hong, Wang Kai, Zhao Hong
{"title":"Research on born-digital image text extraction based on conditional random field","authors":"Zhang Jian, Cheng Ren-hong, Wang Kai, Zhao Hong","doi":"10.1504/IJHPSA.2014.059873","DOIUrl":null,"url":null,"abstract":"With the number of digital videos and digital images increasing tremendously in e-mails and web pages, text extraction from images becomes important more than ever. Born-digital images are generated directly with the computer and the text in the images is important to help the semantic understanding of the images. Although there are many methods proposed over the past years for text extraction from natural scene images, the text detection and extraction from born-digital images remains a challenge. This paper proposes a novel method to segment the text connected components CCs from a born-digital image. Firstly, binarisation is conducted on the given image to get all candidate text CCs based on wavelet theory. Secondly, classification is conducted on the extracted CCs to label text CCs based on conditional random field CRF - a probabilistic graph model that has been widely used in natural language processing. Experimental results show that the proposed method can effectively extract text from the born-digital images.","PeriodicalId":39217,"journal":{"name":"International Journal of High Performance Systems Architecture","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJHPSA.2014.059873","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of High Performance Systems Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJHPSA.2014.059873","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 1

Abstract

With the number of digital videos and digital images increasing tremendously in e-mails and web pages, text extraction from images becomes important more than ever. Born-digital images are generated directly with the computer and the text in the images is important to help the semantic understanding of the images. Although there are many methods proposed over the past years for text extraction from natural scene images, the text detection and extraction from born-digital images remains a challenge. This paper proposes a novel method to segment the text connected components CCs from a born-digital image. Firstly, binarisation is conducted on the given image to get all candidate text CCs based on wavelet theory. Secondly, classification is conducted on the extracted CCs to label text CCs based on conditional random field CRF - a probabilistic graph model that has been widely used in natural language processing. Experimental results show that the proposed method can effectively extract text from the born-digital images.
基于条件随机场的出生数字图像文本提取研究
随着电子邮件和网页中数字视频和数字图像数量的急剧增加,从图像中提取文本变得比以往任何时候都重要。原生数字图像是由计算机直接生成的,图像中的文本对于帮助理解图像的语义非常重要。尽管近年来提出了许多方法来提取自然场景图像中的文本,但对数字图像的文本检测和提取仍然是一个挑战。提出了一种从原始数字图像中分割文本连通分量cc的新方法。首先,基于小波理论对给定图像进行二值化,得到所有候选文本cc;其次,基于自然语言处理中广泛使用的概率图模型条件随机场CRF,对提取的cc进行分类,对文本cc进行标注。实验结果表明,该方法可以有效地从原始数字图像中提取文本。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International Journal of High Performance Systems Architecture
International Journal of High Performance Systems Architecture Computer Science-Hardware and Architecture
CiteScore
2.00
自引率
0.00%
发文量
10
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信