扫描历史文档图像的自适应增强

Farouk Suleiman, Chris J. Hughes, E. Obio
{"title":"扫描历史文档图像的自适应增强","authors":"Farouk Suleiman, Chris J. Hughes, E. Obio","doi":"10.1109/ICECE54449.2021.9674392","DOIUrl":null,"url":null,"abstract":"In this paper we propose, a novel adaptative histogram matching method to remove low contrast, smeared ink, bleed-through and uneven illumination artefacts from scanned images of historical documents. The goal is to provide a better representation of document images and therefore improve readability and the source images for Optical Character Recognition (OCR). Unlike other methods that are designed for single artefacts, our proposed method enhances multiple (low-contrast, smeared-ink, bleed-through and uneven illumination). The method starts by taking the bimodal peaks of the original grayscale image and multiplying them to generated gaussian windows to create an ideal histogram with weights of importance to distribution. This histogram becomes the reference histogram to be matched to the original image for a more optimized image. Median filtering is also incorporated in the method to remove salt and pepper noise. We demonstrate the technique on the European Newspapers project (ENP) dataset chosen from the Pattern recognition and image analysis research lab (PRImA) and establish from the results that, the proposed method significantly reduces background noise and improves image quality on multiple artefacts as compared to other enhancement methods tested. To evaluate the efficiency of the proposed method, we make use of several performance criteria. These include Signal to Noise Ratio (SNR), Mean opinion score (MOS), and visual document image quality assessment (VDIQA) metric. The proposed method performs best in all the evaluation metrics having a 42.6 % increment on the average score of the other methods for MOS, 44.3% increment on average score of other methods for SNR and 61.11% better in VDIQA compared to other methods.","PeriodicalId":166178,"journal":{"name":"2021 IEEE 4th International Conference on Electronics and Communication Engineering (ICECE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive Enhancement for Scanned Historical Document Images\",\"authors\":\"Farouk Suleiman, Chris J. Hughes, E. Obio\",\"doi\":\"10.1109/ICECE54449.2021.9674392\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we propose, a novel adaptative histogram matching method to remove low contrast, smeared ink, bleed-through and uneven illumination artefacts from scanned images of historical documents. The goal is to provide a better representation of document images and therefore improve readability and the source images for Optical Character Recognition (OCR). Unlike other methods that are designed for single artefacts, our proposed method enhances multiple (low-contrast, smeared-ink, bleed-through and uneven illumination). The method starts by taking the bimodal peaks of the original grayscale image and multiplying them to generated gaussian windows to create an ideal histogram with weights of importance to distribution. This histogram becomes the reference histogram to be matched to the original image for a more optimized image. Median filtering is also incorporated in the method to remove salt and pepper noise. We demonstrate the technique on the European Newspapers project (ENP) dataset chosen from the Pattern recognition and image analysis research lab (PRImA) and establish from the results that, the proposed method significantly reduces background noise and improves image quality on multiple artefacts as compared to other enhancement methods tested. To evaluate the efficiency of the proposed method, we make use of several performance criteria. These include Signal to Noise Ratio (SNR), Mean opinion score (MOS), and visual document image quality assessment (VDIQA) metric. The proposed method performs best in all the evaluation metrics having a 42.6 % increment on the average score of the other methods for MOS, 44.3% increment on average score of other methods for SNR and 61.11% better in VDIQA compared to other methods.\",\"PeriodicalId\":166178,\"journal\":{\"name\":\"2021 IEEE 4th International Conference on Electronics and Communication Engineering (ICECE)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 4th International Conference on Electronics and Communication Engineering (ICECE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICECE54449.2021.9674392\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 4th International Conference on Electronics and Communication Engineering (ICECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECE54449.2021.9674392","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在本文中,我们提出了一种新的自适应直方图匹配方法来去除历史文献扫描图像中的低对比度、污迹、漏光和光照不均匀的人工制品。目标是为文档图像提供更好的表示,从而提高光学字符识别(OCR)的可读性和源图像。与其他针对单个人工制品设计的方法不同,我们提出的方法增强了多个(低对比度,涂抹墨水,透光和不均匀照明)。该方法首先取原始灰度图像的双峰,并将其乘以生成的高斯窗口,以创建具有重要分布权值的理想直方图。该直方图成为与原始图像匹配的参考直方图,以获得更优化的图像。在去除椒盐噪声的方法中还加入了中值滤波。我们在模式识别和图像分析研究实验室(PRImA)选择的欧洲报纸项目(ENP)数据集上演示了该技术,并从结果中确定,与所测试的其他增强方法相比,所提出的方法显着降低了背景噪声并提高了多个人工制品的图像质量。为了评估所提出方法的效率,我们使用了几个性能标准。这些指标包括信噪比(SNR)、平均意见评分(MOS)和视觉文档图像质量评估(VDIQA)指标。该方法在所有评价指标中表现最好,在MOS方面比其他方法平均得分提高42.6%,在信噪比方面比其他方法平均得分提高44.3%,在VDIQA方面比其他方法提高61.11%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Adaptive Enhancement for Scanned Historical Document Images
In this paper we propose, a novel adaptative histogram matching method to remove low contrast, smeared ink, bleed-through and uneven illumination artefacts from scanned images of historical documents. The goal is to provide a better representation of document images and therefore improve readability and the source images for Optical Character Recognition (OCR). Unlike other methods that are designed for single artefacts, our proposed method enhances multiple (low-contrast, smeared-ink, bleed-through and uneven illumination). The method starts by taking the bimodal peaks of the original grayscale image and multiplying them to generated gaussian windows to create an ideal histogram with weights of importance to distribution. This histogram becomes the reference histogram to be matched to the original image for a more optimized image. Median filtering is also incorporated in the method to remove salt and pepper noise. We demonstrate the technique on the European Newspapers project (ENP) dataset chosen from the Pattern recognition and image analysis research lab (PRImA) and establish from the results that, the proposed method significantly reduces background noise and improves image quality on multiple artefacts as compared to other enhancement methods tested. To evaluate the efficiency of the proposed method, we make use of several performance criteria. These include Signal to Noise Ratio (SNR), Mean opinion score (MOS), and visual document image quality assessment (VDIQA) metric. The proposed method performs best in all the evaluation metrics having a 42.6 % increment on the average score of the other methods for MOS, 44.3% increment on average score of other methods for SNR and 61.11% better in VDIQA compared to other methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信