{"title":"基于 GAN 的文本行分割方法,适用于具有挑战性的手写文档","authors":"İbrahim Özşeker, Ali Alper Demir, Ufuk Özkaya","doi":"10.1007/s10032-024-00488-5","DOIUrl":null,"url":null,"abstract":"<p>Text line segmentation (TLS) is an essential step of the end-to-end document analysis systems. The main purpose of this step is to extract the individual text lines of any handwritten documents with high accuracy. Handwritten and historical documents mostly contain touching and overlapping characters, heavy diacritics, footnotes and side notes added over the years. In this work, we present a new TLS method based on generative adversarial networks (GAN). TLS problem is tackled as an image-to-image translation problem and the GAN model was trained to learn the spatial information between the individual text lines and their corresponding masks including the text lines. To evaluate the segmentation performance of the proposed GAN model, two challenging datasets, VML-AHTE and VML-MOC, were used. According to the qualitative and quantitative results, the proposed GAN model achieved the best segmentation accuracy on the VML-MOC dataset and showed competitive performance on the VML-AHTE dataset.</p>","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":"40 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GAN-based text line segmentation method for challenging handwritten documents\",\"authors\":\"İbrahim Özşeker, Ali Alper Demir, Ufuk Özkaya\",\"doi\":\"10.1007/s10032-024-00488-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Text line segmentation (TLS) is an essential step of the end-to-end document analysis systems. The main purpose of this step is to extract the individual text lines of any handwritten documents with high accuracy. Handwritten and historical documents mostly contain touching and overlapping characters, heavy diacritics, footnotes and side notes added over the years. In this work, we present a new TLS method based on generative adversarial networks (GAN). TLS problem is tackled as an image-to-image translation problem and the GAN model was trained to learn the spatial information between the individual text lines and their corresponding masks including the text lines. To evaluate the segmentation performance of the proposed GAN model, two challenging datasets, VML-AHTE and VML-MOC, were used. According to the qualitative and quantitative results, the proposed GAN model achieved the best segmentation accuracy on the VML-MOC dataset and showed competitive performance on the VML-AHTE dataset.</p>\",\"PeriodicalId\":50277,\"journal\":{\"name\":\"International Journal on Document Analysis and Recognition\",\"volume\":\"40 1\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal on Document Analysis and Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10032-024-00488-5\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal on Document Analysis and Recognition","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10032-024-00488-5","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
文本行分割(TLS)是端到端文档分析系统的一个基本步骤。这一步骤的主要目的是高精度地提取任何手写文档中的单个文本行。手写文档和历史文献大多包含触摸和重叠字符、大量的变音符号、脚注和多年来添加的旁注。在这项工作中,我们提出了一种基于生成式对抗网络(GAN)的新 TLS 方法。TLS 问题是作为图像到图像的翻译问题来处理的,GAN 模型经过训练,可以学习单个文本行和包括文本行在内的相应掩码之间的空间信息。为了评估所提出的 GAN 模型的分割性能,我们使用了两个具有挑战性的数据集:VML-AHTE 和 VML-MOC。根据定性和定量结果,所提出的 GAN 模型在 VML-MOC 数据集上达到了最佳分割精度,在 VML-AHTE 数据集上表现出了竞争力。
GAN-based text line segmentation method for challenging handwritten documents
Text line segmentation (TLS) is an essential step of the end-to-end document analysis systems. The main purpose of this step is to extract the individual text lines of any handwritten documents with high accuracy. Handwritten and historical documents mostly contain touching and overlapping characters, heavy diacritics, footnotes and side notes added over the years. In this work, we present a new TLS method based on generative adversarial networks (GAN). TLS problem is tackled as an image-to-image translation problem and the GAN model was trained to learn the spatial information between the individual text lines and their corresponding masks including the text lines. To evaluate the segmentation performance of the proposed GAN model, two challenging datasets, VML-AHTE and VML-MOC, were used. According to the qualitative and quantitative results, the proposed GAN model achieved the best segmentation accuracy on the VML-MOC dataset and showed competitive performance on the VML-AHTE dataset.
期刊介绍:
The large number of existing documents and the production of a multitude of new ones every year raise important issues in efficient handling, retrieval and storage of these documents and the information which they contain. This has led to the emergence of new research domains dealing with the recognition by computers of the constituent elements of documents - including characters, symbols, text, lines, graphics, images, handwriting, signatures, etc. In addition, these new domains deal with automatic analyses of the overall physical and logical structures of documents, with the ultimate objective of a high-level understanding of their semantic content. We have also seen renewed interest in optical character recognition (OCR) and handwriting recognition during the last decade. Document analysis and recognition are obviously the next stage.
Automatic, intelligent processing of documents is at the intersections of many fields of research, especially of computer vision, image analysis, pattern recognition and artificial intelligence, as well as studies on reading, handwriting and linguistics. Although quality document related publications continue to appear in journals dedicated to these domains, the community will benefit from having this journal as a focal point for archival literature dedicated to document analysis and recognition.