Towards Physical Distortion Identification and Removal in Document Images

Tan Lu, A. Dooms
{"title":"Towards Physical Distortion Identification and Removal in Document Images","authors":"Tan Lu, A. Dooms","doi":"10.1109/EUVIP.2018.8611786","DOIUrl":null,"url":null,"abstract":"Physical distortions, next to digital artefacts, are commonly seen in document images. Their presence sabotages the optical character recognition (OCR) process which not only leads to a reduced amount of automatically retrievable content, but also deteriorates the performance of other document analysis algorithms that rely on layout analysis or content recognition. This paper proposes a method to identify and remove certain types of physical distortions from document images. By exploiting the intensity and spatial relation of distorted pixels, we construct a conditional random field (CRF) based method for distortion identification. Furthermore, a peak searching method is proposed so that the model parameters of the energy functions in the conditional probability are automatically learnt from the image. Discrimination of the pixels from original document content and those from physical noises is obtained by maximizing the conditional probability in the CRF model. Examples from real-life image samples demonstrate the effectiveness of the proposed method.","PeriodicalId":252212,"journal":{"name":"2018 7th European Workshop on Visual Information Processing (EUVIP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 7th European Workshop on Visual Information Processing (EUVIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EUVIP.2018.8611786","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Physical distortions, next to digital artefacts, are commonly seen in document images. Their presence sabotages the optical character recognition (OCR) process which not only leads to a reduced amount of automatically retrievable content, but also deteriorates the performance of other document analysis algorithms that rely on layout analysis or content recognition. This paper proposes a method to identify and remove certain types of physical distortions from document images. By exploiting the intensity and spatial relation of distorted pixels, we construct a conditional random field (CRF) based method for distortion identification. Furthermore, a peak searching method is proposed so that the model parameters of the energy functions in the conditional probability are automatically learnt from the image. Discrimination of the pixels from original document content and those from physical noises is obtained by maximizing the conditional probability in the CRF model. Examples from real-life image samples demonstrate the effectiveness of the proposed method.
文档图像物理畸变识别与去除研究
在文档图像中,除数字伪影外,物理失真也很常见。它们的存在破坏了光学字符识别(OCR)过程,这不仅导致自动检索内容的数量减少,而且还降低了依赖于布局分析或内容识别的其他文档分析算法的性能。本文提出了一种识别和去除文档图像中某些类型的物理畸变的方法。通过利用畸变像素的强度和空间关系,构造了一种基于条件随机场的畸变识别方法。在此基础上,提出了一种峰值搜索方法,从图像中自动学习条件概率能量函数的模型参数。在CRF模型中,通过最大化条件概率来区分原始文档内容像素和物理噪声像素。实际图像样本验证了该方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信