Xianguo Li, Yu Zhang, Yi Liu, Xingchen Yao, Xinyi Zhou
{"title":"Arbitrary shape text detection fusing InceptionNeXt and multi-scale attention mechanism","authors":"Xianguo Li, Yu Zhang, Yi Liu, Xingchen Yao, Xinyi Zhou","doi":"10.1007/s11227-024-06418-w","DOIUrl":null,"url":null,"abstract":"<p>Existing segmentation-based text detection methods generally face the problems of insufficient receptive fields, insufficient text information filtering, and difficulty in balancing detection accuracy and speed, limiting their ability to detect arbitrary-shaped text in complex backgrounds. To address these problems, we propose a new text detection method fusing the pure ConvNet model InceptionNeXt and the multi-scale attention mechanism. Firstly, we propose a text information reinforcement module to fully extract effective text information from features of different scales while preserving spatial position information. Secondly, we construct the InceptionNeXt Block module to compensate for insufficient receptive fields without significantly reducing speed. Finally, the INA-DBNet network structure is designed to fuse local and global features and achieve the balance of accuracy and speed. Experimental results demonstrate the efficacy of our method. Particularly, on the MSRA-TD500 and Total-text datasets, INA-DBNet achieves 91.3% and 86.7% <i>F</i>-measure while maintaining real-time inference speed. Code is available at: https://github.com/yuyu678/INANET.</p>","PeriodicalId":501596,"journal":{"name":"The Journal of Supercomputing","volume":"79 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11227-024-06418-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Existing segmentation-based text detection methods generally face the problems of insufficient receptive fields, insufficient text information filtering, and difficulty in balancing detection accuracy and speed, limiting their ability to detect arbitrary-shaped text in complex backgrounds. To address these problems, we propose a new text detection method fusing the pure ConvNet model InceptionNeXt and the multi-scale attention mechanism. Firstly, we propose a text information reinforcement module to fully extract effective text information from features of different scales while preserving spatial position information. Secondly, we construct the InceptionNeXt Block module to compensate for insufficient receptive fields without significantly reducing speed. Finally, the INA-DBNet network structure is designed to fuse local and global features and achieve the balance of accuracy and speed. Experimental results demonstrate the efficacy of our method. Particularly, on the MSRA-TD500 and Total-text datasets, INA-DBNet achieves 91.3% and 86.7% F-measure while maintaining real-time inference speed. Code is available at: https://github.com/yuyu678/INANET.