Deep learning-based modified-EAST scene text detector: insights from a novel multiscript dataset

IF 2.5 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal on Document Analysis and Recognition Pub Date : 2024-07-31 DOI:10.1007/s10032-024-00491-w

Shilpa Mahajan, Rajneesh Rani, Aman Kamboj

{"title":"Deep learning-based modified-EAST scene text detector: insights from a novel multiscript dataset","authors":"Shilpa Mahajan, Rajneesh Rani, Aman Kamboj","doi":"10.1007/s10032-024-00491-w","DOIUrl":null,"url":null,"abstract":"<p>The field of computer vision has seen significant transformation with the emergence and advancement of deep learning models. Deep learning waves have a significant impact on scene text detection, a vital and active area in computer vision. Numerous scientific, industrial, and academic procedures make use of text analysis. Natural scene text detection is more difficult than document image text detection owing to variations in font, size, style, brightness, etc. The National Institute of Technology Jalandhar-Text Detection dataset (NITJ-TD) is a new dataset that we have put forward in this study for various text analysis tasks including text detection, text segmentation, script identification, text recognition, etc. a deep learning model that seeks to identify the text’s location within the image,which are gathered in an unrestricted setting. The system consists of an NMS to choose the best match and prevent repeated predictions, and a modified EAST to pinpoint the exact ROI in the image. To improve the model’s performance, an enhancement module is added to the fundamental Efficient and Accurate Scene Text detector (EAST). The suggested approach is contrasted in terms of text word detection in the image. Several pre-trained models are used to assign the text word to various intersections over Union (IoU) values. We made use of our NITJ-TD dataset, which is made up of 1500 photos that were gathered from various North Indian sites. Punjabi, English, and Hindi scripts can be seen on the images. We also examined the outcomes of the ICDAR-2013 benchmark dataset. On both the suggested dataset and the benchmarked dataset, our approach performed better.\n</p>","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":"50 1","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal on Document Analysis and Recognition","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10032-024-00491-w","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The field of computer vision has seen significant transformation with the emergence and advancement of deep learning models. Deep learning waves have a significant impact on scene text detection, a vital and active area in computer vision. Numerous scientific, industrial, and academic procedures make use of text analysis. Natural scene text detection is more difficult than document image text detection owing to variations in font, size, style, brightness, etc. The National Institute of Technology Jalandhar-Text Detection dataset (NITJ-TD) is a new dataset that we have put forward in this study for various text analysis tasks including text detection, text segmentation, script identification, text recognition, etc. a deep learning model that seeks to identify the text’s location within the image,which are gathered in an unrestricted setting. The system consists of an NMS to choose the best match and prevent repeated predictions, and a modified EAST to pinpoint the exact ROI in the image. To improve the model’s performance, an enhancement module is added to the fundamental Efficient and Accurate Scene Text detector (EAST). The suggested approach is contrasted in terms of text word detection in the image. Several pre-trained models are used to assign the text word to various intersections over Union (IoU) values. We made use of our NITJ-TD dataset, which is made up of 1500 photos that were gathered from various North Indian sites. Punjabi, English, and Hindi scripts can be seen on the images. We also examined the outcomes of the ICDAR-2013 benchmark dataset. On both the suggested dataset and the benchmarked dataset, our approach performed better.

Abstract Image

查看原文本刊更多论文

基于深度学习的修改后 EAST 场景文本检测器：从新型多脚本数据集中获得的启示

随着深度学习模型的出现和发展，计算机视觉领域发生了重大变革。深度学习浪潮对场景文本检测产生了重大影响，而场景文本检测是计算机视觉中一个重要而活跃的领域。许多科学、工业和学术程序都会用到文本分析。由于字体、大小、风格、亮度等的变化，自然场景文本检测比文档图像文本检测更加困难。国立贾朗达尔理工学院-文本检测数据集（NITJ-TD）是一个新的数据集，我们在本研究中将其用于各种文本分析任务，包括文本检测、文本分割、脚本识别、文本识别等。该系统由一个 NMS 和一个修改后的 EAST 组成，前者用于选择最佳匹配并防止重复预测，后者用于精确定位图像中的 ROI。为了提高模型的性能，在基本的高效精确场景文本检测器（EAST）中添加了一个增强模块。建议的方法在图像中的文本字词检测方面进行了对比。我们使用了几个预先训练好的模型，将文本词分配到不同的交叉联合（IoU）值上。我们使用了 NITJ-TD 数据集，该数据集由从北印度多个网站收集的 1500 张照片组成。图片上可以看到旁遮普语、英语和印地语脚本。我们还检查了 ICDAR-2013 基准数据集的结果。在建议数据集和基准数据集上，我们的方法都表现得更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal on Document Analysis and Recognition 工程技术-计算机：人工智能

CiteScore

6.20

自引率

4.30%

发文量

审稿时长

7.5 months

期刊介绍： The large number of existing documents and the production of a multitude of new ones every year raise important issues in efficient handling, retrieval and storage of these documents and the information which they contain. This has led to the emergence of new research domains dealing with the recognition by computers of the constituent elements of documents - including characters, symbols, text, lines, graphics, images, handwriting, signatures, etc. In addition, these new domains deal with automatic analyses of the overall physical and logical structures of documents, with the ultimate objective of a high-level understanding of their semantic content. We have also seen renewed interest in optical character recognition (OCR) and handwriting recognition during the last decade. Document analysis and recognition are obviously the next stage. Automatic, intelligent processing of documents is at the intersections of many fields of research, especially of computer vision, image analysis, pattern recognition and artificial intelligence, as well as studies on reading, handwriting and linguistics. Although quality document related publications continue to appear in journals dedicated to these domains, the community will benefit from having this journal as a focal point for archival literature dedicated to document analysis and recognition.