EF2Net: Better Extracting, Fusing and Focusing Text Features for Scene Text Detection

2023 4th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT) Pub Date : 2023-06-16 DOI:10.1109/AINIT59027.2023.10212684

Xiangyang Qu, Chongyang Zhang

{"title":"EF2Net: Better Extracting, Fusing and Focusing Text Features for Scene Text Detection","authors":"Xiangyang Qu, Chongyang Zhang","doi":"10.1109/AINIT59027.2023.10212684","DOIUrl":null,"url":null,"abstract":"Text detection in natural scene images is a chal-lenging task that requires localization and fitting of text regions. Currently, existing natural scene methods use fixed-size convolutional kernels to extract text instance features and have achieved good results. However, due to the extremely large aspect ratio of text regions in natural scenes, extracting features using fixed-size convolutional kernels introduces background noise, which affects the accuracy of text detection. In addition, complex backgrounds in natural scenes may cause text features in existing methods to be incorrectly detected as text, while small and ambiguous text may be missed in the detection. To address these challenges, first, we use a new backbone with multi-branch depth band convolution to better capture text features in large aspect ratios and multi-scale backgrounds. Then, we propose a novel FPN that can obtain detailed information and scale sequence features to enhance the feature information of small texts. Finally, we design a dynamic text detection head that combines a text detection head with three attention mechanisms. We perceive from three dimensions: scale, space, and channel, enhance multi-scale text region features, focus on foreground targets, and accurately locate text regions, finally achieving the effect of reducing false and missed detections. In conclusion, the method proposed in this paper achieves good performance in text detection tasks in natural scenes and solves some problems in existing methods. Experimental results show that our proposed model achieves a comprehensive surpass compared with the text detection baseline.","PeriodicalId":276778,"journal":{"name":"2023 4th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT)","volume":"49 5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 4th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AINIT59027.2023.10212684","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Text detection in natural scene images is a chal-lenging task that requires localization and fitting of text regions. Currently, existing natural scene methods use fixed-size convolutional kernels to extract text instance features and have achieved good results. However, due to the extremely large aspect ratio of text regions in natural scenes, extracting features using fixed-size convolutional kernels introduces background noise, which affects the accuracy of text detection. In addition, complex backgrounds in natural scenes may cause text features in existing methods to be incorrectly detected as text, while small and ambiguous text may be missed in the detection. To address these challenges, first, we use a new backbone with multi-branch depth band convolution to better capture text features in large aspect ratios and multi-scale backgrounds. Then, we propose a novel FPN that can obtain detailed information and scale sequence features to enhance the feature information of small texts. Finally, we design a dynamic text detection head that combines a text detection head with three attention mechanisms. We perceive from three dimensions: scale, space, and channel, enhance multi-scale text region features, focus on foreground targets, and accurately locate text regions, finally achieving the effect of reducing false and missed detections. In conclusion, the method proposed in this paper achieves good performance in text detection tasks in natural scenes and solves some problems in existing methods. Experimental results show that our proposed model achieves a comprehensive surpass compared with the text detection baseline.

查看原文本刊更多论文

EF2Net:更好的提取，融合和聚焦文本特征的场景文本检测

自然场景图像的文本检测是一项具有挑战性的任务，需要对文本区域进行定位和拟合。目前，现有的自然场景方法使用固定大小的卷积核提取文本实例特征，并取得了较好的效果。然而，由于自然场景中文本区域的纵横比非常大，使用固定大小的卷积核提取特征会引入背景噪声，影响文本检测的准确性。此外，自然场景中的复杂背景可能会导致现有方法中的文本特征被错误地检测为文本，而小文本和模糊文本可能会在检测中被遗漏。为了解决这些问题，首先，我们使用一种新的具有多分支深度带卷积的主干来更好地捕获大纵横比和多尺度背景下的文本特征。然后，我们提出了一种新的FPN，它可以获取细节信息和缩放序列特征，以增强小文本的特征信息。最后，我们设计了一个动态文本检测头，它结合了文本检测头和三种注意机制。我们从尺度、空间和通道三个维度进行感知，增强多尺度文本区域特征，聚焦前景目标，准确定位文本区域，最终达到减少误检和漏检的效果。综上所述，本文提出的方法在自然场景的文本检测任务中取得了较好的性能，解决了现有方法存在的一些问题。实验结果表明，与文本检测基线相比，我们提出的模型实现了全面超越。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 4th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT)

自引率

0.00%

发文量