基于预测双通道图像的卷积神经网络退化文档图像二值化

2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI:10.1109/ICDAR.2019.00160

Y. Akbari, A. Britto, S. Al-Maadeed, Luiz Oliveira

{"title":"基于预测双通道图像的卷积神经网络退化文档图像二值化","authors":"Y. Akbari, A. Britto, S. Al-Maadeed, Luiz Oliveira","doi":"10.1109/ICDAR.2019.00160","DOIUrl":null,"url":null,"abstract":"Due to the poor condition of most of historical documents, binarization is difficult to separate document image background pixels from foreground pixels. This paper proposes Convolutional Neural Networks (CNNs) based on predicted two-channel images in which CNNs are trained to classify the foreground pixels. The promising results from the use of multispectral images for semantic segmentation inspired our efforts to create a novel prediction-based two-channel image. In our method, the original image is binarized by the structural symmetric pixels (SSPs) method, and the two-channel image is constructed from the original image and its binarized image. In order to explore impact of proposed two-channel images as network inputs, we use two popular CNNs architectures, namely SegNet and U-net. The results presented in this work show that our approach fully outperforms SegNet and U-net when trained by the original images and demonstrates competitiveness and robustness compared with state-of-the-art results using the DIBCO database.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Binarization of Degraded Document Images using Convolutional Neural Networks Based on Predicted Two-Channel Images\",\"authors\":\"Y. Akbari, A. Britto, S. Al-Maadeed, Luiz Oliveira\",\"doi\":\"10.1109/ICDAR.2019.00160\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the poor condition of most of historical documents, binarization is difficult to separate document image background pixels from foreground pixels. This paper proposes Convolutional Neural Networks (CNNs) based on predicted two-channel images in which CNNs are trained to classify the foreground pixels. The promising results from the use of multispectral images for semantic segmentation inspired our efforts to create a novel prediction-based two-channel image. In our method, the original image is binarized by the structural symmetric pixels (SSPs) method, and the two-channel image is constructed from the original image and its binarized image. In order to explore impact of proposed two-channel images as network inputs, we use two popular CNNs architectures, namely SegNet and U-net. The results presented in this work show that our approach fully outperforms SegNet and U-net when trained by the original images and demonstrates competitiveness and robustness compared with state-of-the-art results using the DIBCO database.\",\"PeriodicalId\":325437,\"journal\":{\"name\":\"2019 International Conference on Document Analysis and Recognition (ICDAR)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Document Analysis and Recognition (ICDAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.2019.00160\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2019.00160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

摘要

由于大多数历史文档的状况较差，二值化很难将文档图像的背景像素与前景像素分离开来。本文提出了基于预测的双通道图像的卷积神经网络(cnn)，训练cnn对前景像素进行分类。使用多光谱图像进行语义分割的有希望的结果激发了我们创建一种新的基于预测的双通道图像的努力。该方法采用结构对称像素(ssp)法对原始图像进行二值化处理，并将原始图像与其二值化后的图像构造成双通道图像。为了探索所提出的双通道图像作为网络输入的影响，我们使用了两种流行的cnn架构，即SegNet和U-net。在这项工作中提出的结果表明，我们的方法在原始图像训练时完全优于SegNet和U-net，并且与使用DIBCO数据库的最新结果相比，显示出竞争力和鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Binarization of Degraded Document Images using Convolutional Neural Networks Based on Predicted Two-Channel Images

Due to the poor condition of most of historical documents, binarization is difficult to separate document image background pixels from foreground pixels. This paper proposes Convolutional Neural Networks (CNNs) based on predicted two-channel images in which CNNs are trained to classify the foreground pixels. The promising results from the use of multispectral images for semantic segmentation inspired our efforts to create a novel prediction-based two-channel image. In our method, the original image is binarized by the structural symmetric pixels (SSPs) method, and the two-channel image is constructed from the original image and its binarized image. In order to explore impact of proposed two-channel images as network inputs, we use two popular CNNs architectures, namely SegNet and U-net. The results presented in this work show that our approach fully outperforms SegNet and U-net when trained by the original images and demonstrates competitiveness and robustness compared with state-of-the-art results using the DIBCO database.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 International Conference on Document Analysis and Recognition (ICDAR)

自引率

0.00%

发文量