文本线提取使用深度学习和最小子接缝

Proceedings of the 21st ACM Symposium on Document Engineering Pub Date : 2021-08-16 DOI:10.1145/3469096.3474941

Adi Azran, A. Schclar, Raid Saabni

{"title":"文本线提取使用深度学习和最小子接缝","authors":"Adi Azran, A. Schclar, Raid Saabni","doi":"10.1145/3469096.3474941","DOIUrl":null,"url":null,"abstract":"Accurate text line extraction is a vital prerequisite for efficient and successful text recognition systems ranging from keywords/phrases searching to complete conversion to text. In many cases, the proposed algorithms target binary pre-processed versions of the image, which may cause insufficient results due to poor quality document images. Recently, more papers present solutions that work directly on gray-level images [1,2,7,12,15]. In this paper, we present a novel robust, and efficient algorithm to extract text-lines directly from gray-level document images. The proposed approach uses a combination of two variants of Convolutional Neural Network (CNNs), followed by minimal energy seam extraction. The first ConvNet is a modified version of the autoencoder used for biomedical image segmentation [8]. The second is a deep convolutional Neural Network, working on overlapping vertical slices of the original image. The two variants are combined to one neural net after re-attaching the resulting slices of the second net. The merged results of the two nets are used as a preprocessed image to obtain an energy map for a second phase. In the second step, we use the algorithm presented in [2], to track minimal energy sub-seams accumulated to perform a full local minimal/maximal separating and medial seam defining the text baselines and the text line regions. We have tested our approach on multi-lingual various datasets written at a range of image quality based on the ICDAR datasets.","PeriodicalId":423462,"journal":{"name":"Proceedings of the 21st ACM Symposium on Document Engineering","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Text line extraction using deep learning and minimal sub seams\",\"authors\":\"Adi Azran, A. Schclar, Raid Saabni\",\"doi\":\"10.1145/3469096.3474941\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accurate text line extraction is a vital prerequisite for efficient and successful text recognition systems ranging from keywords/phrases searching to complete conversion to text. In many cases, the proposed algorithms target binary pre-processed versions of the image, which may cause insufficient results due to poor quality document images. Recently, more papers present solutions that work directly on gray-level images [1,2,7,12,15]. In this paper, we present a novel robust, and efficient algorithm to extract text-lines directly from gray-level document images. The proposed approach uses a combination of two variants of Convolutional Neural Network (CNNs), followed by minimal energy seam extraction. The first ConvNet is a modified version of the autoencoder used for biomedical image segmentation [8]. The second is a deep convolutional Neural Network, working on overlapping vertical slices of the original image. The two variants are combined to one neural net after re-attaching the resulting slices of the second net. The merged results of the two nets are used as a preprocessed image to obtain an energy map for a second phase. In the second step, we use the algorithm presented in [2], to track minimal energy sub-seams accumulated to perform a full local minimal/maximal separating and medial seam defining the text baselines and the text line regions. We have tested our approach on multi-lingual various datasets written at a range of image quality based on the ICDAR datasets.\",\"PeriodicalId\":423462,\"journal\":{\"name\":\"Proceedings of the 21st ACM Symposium on Document Engineering\",\"volume\":\"2016 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 21st ACM Symposium on Document Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3469096.3474941\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st ACM Symposium on Document Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3469096.3474941","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

准确的文本行提取是有效和成功的文本识别系统的重要前提，从关键字/短语搜索到完全转换为文本。在许多情况下，所提出的算法针对的是图像的二值预处理版本，这可能会由于文档图像质量差而导致结果不足。最近，更多的论文提出了直接在灰度图像上工作的解决方案[1,2,7,12,15]。在本文中，我们提出了一种新的鲁棒、高效的算法来直接从灰度文档图像中提取文本行。该方法结合了卷积神经网络(cnn)的两种变体，然后进行最小能量层提取。第一个卷积神经网络是用于生物医学图像分割的自编码器的改进版本[8]。第二种是深度卷积神经网络，处理原始图像的重叠垂直切片。在重新连接第二个网络的结果切片后，这两个变体被合并到一个神经网络中。将两个网络的合并结果作为预处理图像，获得第二阶段的能量图。在第二步中，我们使用[2]中提出的算法来跟踪累积的最小能量子接缝，以执行完整的局部最小/最大分离和定义文本基线和文本线区域的内侧接缝。我们已经在基于ICDAR数据集的一系列图像质量的多语言各种数据集上测试了我们的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Text line extraction using deep learning and minimal sub seams

Accurate text line extraction is a vital prerequisite for efficient and successful text recognition systems ranging from keywords/phrases searching to complete conversion to text. In many cases, the proposed algorithms target binary pre-processed versions of the image, which may cause insufficient results due to poor quality document images. Recently, more papers present solutions that work directly on gray-level images [1,2,7,12,15]. In this paper, we present a novel robust, and efficient algorithm to extract text-lines directly from gray-level document images. The proposed approach uses a combination of two variants of Convolutional Neural Network (CNNs), followed by minimal energy seam extraction. The first ConvNet is a modified version of the autoencoder used for biomedical image segmentation [8]. The second is a deep convolutional Neural Network, working on overlapping vertical slices of the original image. The two variants are combined to one neural net after re-attaching the resulting slices of the second net. The merged results of the two nets are used as a preprocessed image to obtain an energy map for a second phase. In the second step, we use the algorithm presented in [2], to track minimal energy sub-seams accumulated to perform a full local minimal/maximal separating and medial seam defining the text baselines and the text line regions. We have tested our approach on multi-lingual various datasets written at a range of image quality based on the ICDAR datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 21st ACM Symposium on Document Engineering

自引率

0.00%

发文量