An Efficient New PDE-based Characters Reconstruction after Graphics Removal

Louisa Kessi, Frank Lebourgeois, Christophe Garcia
{"title":"An Efficient New PDE-based Characters Reconstruction after Graphics Removal","authors":"Louisa Kessi, Frank Lebourgeois, Christophe Garcia","doi":"10.1109/ICFHR.2016.0088","DOIUrl":null,"url":null,"abstract":"The separation between texts and graphics when they are overlapped is a challenging problem for digitization companies. In a previous work [1], we presented the first unsupervised fully automatic segmentation system adapted for colour business document with significant colour complexity and dithered background. The system achieves several operations to segment automatically colour images, separate text from noise and graphics and provides colour information about text colour. After split overlapped characters and separates characters from graphics, characters are broken. The OCR system becomes unable to recognize successfully broken characters and its efficiency is thus seriously affected. This paper presents the first Character Reconstruction System through a new PDE (Partial Differential Equation)-based approach. Our approach takes benefit of the combination of the anisotropic morphology proposed by Breuß and the Weickert Coherence enhancing shock filter diffusion. We introduce and present a continuous anisotropic morphology method driven by the main direction of the first order tensors applied in the neighborhood of the missing part left by the separation between text and graphics. It reconstructs the missing part even when the left area is larger than the strokes width. The coherency of the orientation of the tensors around missing parts overcomes the problem of image noises. The application of the ABBY FineReader OCR engine proves an important reduction in OCR errors. Our experiments show that our proposition compared to the existing state of the art requires no training steps and outperforms both of anisotropic morphology and the Weickert Coherence enhancing shock filter diffusion applied separately.","PeriodicalId":194844,"journal":{"name":"2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFHR.2016.0088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

The separation between texts and graphics when they are overlapped is a challenging problem for digitization companies. In a previous work [1], we presented the first unsupervised fully automatic segmentation system adapted for colour business document with significant colour complexity and dithered background. The system achieves several operations to segment automatically colour images, separate text from noise and graphics and provides colour information about text colour. After split overlapped characters and separates characters from graphics, characters are broken. The OCR system becomes unable to recognize successfully broken characters and its efficiency is thus seriously affected. This paper presents the first Character Reconstruction System through a new PDE (Partial Differential Equation)-based approach. Our approach takes benefit of the combination of the anisotropic morphology proposed by Breuß and the Weickert Coherence enhancing shock filter diffusion. We introduce and present a continuous anisotropic morphology method driven by the main direction of the first order tensors applied in the neighborhood of the missing part left by the separation between text and graphics. It reconstructs the missing part even when the left area is larger than the strokes width. The coherency of the orientation of the tensors around missing parts overcomes the problem of image noises. The application of the ABBY FineReader OCR engine proves an important reduction in OCR errors. Our experiments show that our proposition compared to the existing state of the art requires no training steps and outperforms both of anisotropic morphology and the Weickert Coherence enhancing shock filter diffusion applied separately.
一种高效的基于pde的图形移除后字符重建方法
当文字和图形重叠时,它们之间的分离是数字化公司的一个具有挑战性的问题。在之前的工作[1]中,我们提出了第一个适用于具有显着颜色复杂性和抖动背景的彩色商业文档的无监督全自动分割系统。该系统实现了彩色图像的自动分割、文本与噪声、图形的自动分离以及文本颜色信息的自动提供。在拆分重叠字符并将字符从图形中分离出来之后,字符就被拆分了。OCR系统无法成功识别断字符,严重影响了OCR系统的效率。本文通过一种新的基于偏微分方程(PDE)的方法,提出了第一个字符重建系统。我们的方法利用了Breuß提出的各向异性形态和Weickert相干增强激波滤波器扩散的结合。介绍并提出了一种由一阶张量主方向驱动的连续各向异性形态学方法,该方法应用于文本与图形分离留下的缺失部分的邻域。即使左侧区域大于笔画宽度,它也会重建缺失的部分。缺失部分周围张量方向的相干性克服了图像噪声问题。ABBY FineReader OCR引擎的应用证明了OCR错误的显著降低。我们的实验表明,与现有的技术相比,我们的命题不需要训练步骤,并且优于各向异性形态学和Weickert相干增强激波滤波器扩散单独应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信