Scene text detection using structured information and an end-to-end trainable generative adversarial networks

IF 3.7 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications Pub Date : 2024-03-19 DOI:10.1007/s10044-024-01259-y

Palanichamy Naveen, Mahmoud Hassaballah

{"title":"Scene text detection using structured information and an end-to-end trainable generative adversarial networks","authors":"Palanichamy Naveen, Mahmoud Hassaballah","doi":"10.1007/s10044-024-01259-y","DOIUrl":null,"url":null,"abstract":"<p>Scene text detection poses a considerable challenge due to the diverse nature of text appearance, backgrounds, and orientations. Enhancing robustness, accuracy, and efficiency in this context is vital for several applications, such as optical character recognition, image understanding, and autonomous vehicles. This paper explores the integration of generative adversarial network (GAN) and network variational autoencoder (VAE) to create a robust and potent text detection network. The proposed architecture comprises three interconnected modules: the VAE module, the GAN module, and the text detection module. In this framework, the VAE module plays a pivotal role in generating diverse and variable text regions. Subsequently, the GAN module refines and enhances these regions, ensuring heightened realism and accuracy. Then, the text detection module takes charge of identifying text regions in the input image via assigning confidence scores to each region. The comprehensive training of the entire network involves minimizing a joint loss function that encompasses the VAE loss, the GAN loss, and the text detection loss. The VAE loss ensures diversity in generated text regions and the GAN loss guarantees realism and accuracy, while the text detection loss ensures high-precision identification of text regions. The proposed method employs an encoder-decoder structure within the VAE module and a generator-discriminator structure in the GAN module. Rigorous testing on diverse datasets including Total-Text, CTW1500, ICDAR 2015, ICDAR 2017, ReCTS, TD500, COCO-Text, SynthText, Street View Text, and KIAST Scene Text demonstrates the superior performance of the proposed method compared to existing approaches.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"1 1","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Analysis and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10044-024-01259-y","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Scene text detection poses a considerable challenge due to the diverse nature of text appearance, backgrounds, and orientations. Enhancing robustness, accuracy, and efficiency in this context is vital for several applications, such as optical character recognition, image understanding, and autonomous vehicles. This paper explores the integration of generative adversarial network (GAN) and network variational autoencoder (VAE) to create a robust and potent text detection network. The proposed architecture comprises three interconnected modules: the VAE module, the GAN module, and the text detection module. In this framework, the VAE module plays a pivotal role in generating diverse and variable text regions. Subsequently, the GAN module refines and enhances these regions, ensuring heightened realism and accuracy. Then, the text detection module takes charge of identifying text regions in the input image via assigning confidence scores to each region. The comprehensive training of the entire network involves minimizing a joint loss function that encompasses the VAE loss, the GAN loss, and the text detection loss. The VAE loss ensures diversity in generated text regions and the GAN loss guarantees realism and accuracy, while the text detection loss ensures high-precision identification of text regions. The proposed method employs an encoder-decoder structure within the VAE module and a generator-discriminator structure in the GAN module. Rigorous testing on diverse datasets including Total-Text, CTW1500, ICDAR 2015, ICDAR 2017, ReCTS, TD500, COCO-Text, SynthText, Street View Text, and KIAST Scene Text demonstrates the superior performance of the proposed method compared to existing approaches.

Abstract Image

查看原文本刊更多论文

利用结构化信息和端到端可训练生成式对抗网络进行场景文本检测

由于文字外观、背景和方向的多样性，场景文字检测是一项相当大的挑战。在这种情况下，提高鲁棒性、准确性和效率对于光学字符识别、图像理解和自动驾驶汽车等多种应用至关重要。本文探讨了生成式对抗网络（GAN）与网络变异自动编码器（VAE）的整合，以创建一个强大而有效的文本检测网络。所提出的架构由三个相互关联的模块组成：VAE 模块、GAN 模块和文本检测模块。在此框架中，VAE 模块在生成多样化和可变的文本区域方面发挥着关键作用。随后，GAN 模块对这些区域进行细化和增强，以确保更高的真实性和准确性。然后，文本检测模块负责通过为每个区域分配置信度分数来识别输入图像中的文本区域。整个网络的综合训练包括最小化联合损失函数，其中包括 VAE 损失、GAN 损失和文本检测损失。VAE 损失可确保生成文本区域的多样性，GAN 损失可确保真实性和准确性，而文本检测损失则可确保文本区域的高精度识别。所提出的方法在 VAE 模块中采用了编码器-解码器结构，在 GAN 模块中采用了生成器-鉴别器结构。在不同的数据集（包括 Total-Text、CTW1500、ICDAR 2015、ICDAR 2017、ReCTS、TD500、COCO-Text、SynthText、Street View Text 和 KIAST Scene Text）上进行的严格测试表明，与现有方法相比，所提出的方法具有更优越的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pattern Analysis and Applications 工程技术-计算机：人工智能

CiteScore

7.40

自引率

2.60%

发文量

审稿时长

13.5 months

期刊介绍： The journal publishes high quality articles in areas of fundamental research in intelligent pattern analysis and applications in computer science and engineering. It aims to provide a forum for original research which describes novel pattern analysis techniques and industrial applications of the current technology. In addition, the journal will also publish articles on pattern analysis applications in medical imaging. The journal solicits articles that detail new technology and methods for pattern recognition and analysis in applied domains including, but not limited to, computer vision and image processing, speech analysis, robotics, multimedia, document analysis, character recognition, knowledge engineering for pattern recognition, fractal analysis, and intelligent control. The journal publishes articles on the use of advanced pattern recognition and analysis methods including statistical techniques, neural networks, genetic algorithms, fuzzy pattern recognition, machine learning, and hardware implementations which are either relevant to the development of pattern analysis as a research area or detail novel pattern analysis applications. Papers proposing new classifier systems or their development, pattern analysis systems for real-time applications, fuzzy and temporal pattern recognition and uncertainty management in applied pattern recognition are particularly solicited.