Arbitrary shape text detection fusing InceptionNeXt and multi-scale attention mechanism

The Journal of Supercomputing Pub Date : 2024-08-12 DOI:10.1007/s11227-024-06418-w

Xianguo Li, Yu Zhang, Yi Liu, Xingchen Yao, Xinyi Zhou

{"title":"Arbitrary shape text detection fusing InceptionNeXt and multi-scale attention mechanism","authors":"Xianguo Li, Yu Zhang, Yi Liu, Xingchen Yao, Xinyi Zhou","doi":"10.1007/s11227-024-06418-w","DOIUrl":null,"url":null,"abstract":"<p>Existing segmentation-based text detection methods generally face the problems of insufficient receptive fields, insufficient text information filtering, and difficulty in balancing detection accuracy and speed, limiting their ability to detect arbitrary-shaped text in complex backgrounds. To address these problems, we propose a new text detection method fusing the pure ConvNet model InceptionNeXt and the multi-scale attention mechanism. Firstly, we propose a text information reinforcement module to fully extract effective text information from features of different scales while preserving spatial position information. Secondly, we construct the InceptionNeXt Block module to compensate for insufficient receptive fields without significantly reducing speed. Finally, the INA-DBNet network structure is designed to fuse local and global features and achieve the balance of accuracy and speed. Experimental results demonstrate the efficacy of our method. Particularly, on the MSRA-TD500 and Total-text datasets, INA-DBNet achieves 91.3% and 86.7% <i>F</i>-measure while maintaining real-time inference speed. Code is available at: https://github.com/yuyu678/INANET.</p>","PeriodicalId":501596,"journal":{"name":"The Journal of Supercomputing","volume":"79 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11227-024-06418-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Existing segmentation-based text detection methods generally face the problems of insufficient receptive fields, insufficient text information filtering, and difficulty in balancing detection accuracy and speed, limiting their ability to detect arbitrary-shaped text in complex backgrounds. To address these problems, we propose a new text detection method fusing the pure ConvNet model InceptionNeXt and the multi-scale attention mechanism. Firstly, we propose a text information reinforcement module to fully extract effective text information from features of different scales while preserving spatial position information. Secondly, we construct the InceptionNeXt Block module to compensate for insufficient receptive fields without significantly reducing speed. Finally, the INA-DBNet network structure is designed to fuse local and global features and achieve the balance of accuracy and speed. Experimental results demonstrate the efficacy of our method. Particularly, on the MSRA-TD500 and Total-text datasets, INA-DBNet achieves 91.3% and 86.7% F-measure while maintaining real-time inference speed. Code is available at: https://github.com/yuyu678/INANET.

Abstract Image

查看原文本刊更多论文

融合 InceptionNeXt 和多尺度关注机制的任意形状文本检测

现有的基于分割的文本检测方法普遍面临感受野不足、文本信息过滤不充分、检测精度和速度难以兼顾等问题，限制了其在复杂背景中检测任意形状文本的能力。针对这些问题，我们提出了一种融合纯 ConvNet 模型 InceptionNeXt 和多尺度注意力机制的新文本检测方法。首先，我们提出了文本信息强化模块，在保留空间位置信息的同时，从不同尺度的特征中充分提取有效的文本信息。其次，我们构建了 InceptionNeXt Block 模块，以在不显著降低速度的情况下补偿不足的感受野。最后，我们设计了 INA-DBNet 网络结构，以融合局部和全局特征，实现准确性和速度的平衡。实验结果证明了我们方法的有效性。特别是在 MSRA-TD500 和 Total-text 数据集上，INA-DBNet 在保持实时推理速度的同时，F-measure 分别达到了 91.3% 和 86.7%。代码见：https://github.com/yuyu678/INANET。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The Journal of Supercomputing

自引率

0.00%

发文量