ESRNet: an exploring sample relationships network for arbitrary-shaped scene text detection

IF 3.4 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Intelligence Pub Date : 2024-09-09 DOI:10.1007/s10489-024-05773-8

Huageng Fan, Tongwei Lu

{"title":"ESRNet: an exploring sample relationships network for arbitrary-shaped scene text detection","authors":"Huageng Fan, Tongwei Lu","doi":"10.1007/s10489-024-05773-8","DOIUrl":null,"url":null,"abstract":"<div><p>Recently transformer-based scene text detection methods have been gradually investigated. However, these methods usually use attention to model visual content relationships in single sample, ignoring the relationships between samples. Exploring sample relationships enables feature propagation between samples, which facilitates detector to detect scene text images with more complex features. Aware of the challenges above, this paper proposes exploring sample relationships network (ESRNet) for detecting arbitrary-shaped texts. In detail, we construct the exploring sample relationships module (ESRM) to model sample relationships in the encoder, capturing interactions between all samples in each batch and propagating features across samples. Because of the inconsistency in batch sizes for training and testing leads to differences in exploring sample relationships between these two phases, so two-stream encoder method is used to solve the problem. Moreover, we propose location-aware factorized self-attention (LAFSA), which incorporates the sequential information of text polygon control points into the modeling and effectively improves the accuracy of label reading order in terms of visual features. Experimental results on multiple datasets demonstrate that ESRNet exhibits superior performance compared to other methods. Notably, ESRNet achieves F-measure of 88.9<span>\\(\\%\\)</span>, 88.4<span>\\(\\%\\)</span>, and 77.4<span>\\(\\%\\)</span> on the Total-Text, CTW1500, and ArT datasets, respectively.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"54 22","pages":"11995 - 12008"},"PeriodicalIF":3.4000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-024-05773-8","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Recently transformer-based scene text detection methods have been gradually investigated. However, these methods usually use attention to model visual content relationships in single sample, ignoring the relationships between samples. Exploring sample relationships enables feature propagation between samples, which facilitates detector to detect scene text images with more complex features. Aware of the challenges above, this paper proposes exploring sample relationships network (ESRNet) for detecting arbitrary-shaped texts. In detail, we construct the exploring sample relationships module (ESRM) to model sample relationships in the encoder, capturing interactions between all samples in each batch and propagating features across samples. Because of the inconsistency in batch sizes for training and testing leads to differences in exploring sample relationships between these two phases, so two-stream encoder method is used to solve the problem. Moreover, we propose location-aware factorized self-attention (LAFSA), which incorporates the sequential information of text polygon control points into the modeling and effectively improves the accuracy of label reading order in terms of visual features. Experimental results on multiple datasets demonstrate that ESRNet exhibits superior performance compared to other methods. Notably, ESRNet achieves F-measure of 88.9\(\%\), 88.4\(\%\), and 77.4\(\%\) on the Total-Text, CTW1500, and ArT datasets, respectively.

Abstract Image

查看原文本刊更多论文

ESRNet：用于任意形状场景文本检测的探索样本关系网络

近年来，基于变换器的场景文本检测方法逐渐得到研究。然而，这些方法通常使用注意力来模拟单个样本的视觉内容关系，而忽略了样本之间的关系。探索样本间的关系可以实现样本间的特征传播，从而有助于检测器检测具有更复杂特征的场景文本图像。意识到上述挑战，本文提出了用于检测任意形状文本的探索样本关系网络（ESRNet）。具体来说，我们构建了探索样本关系模块（ESRM）来模拟编码器中的样本关系，捕捉每个批次中所有样本之间的交互，并在样本间传播特征。由于训练和测试的批量大小不一致，导致这两个阶段的探索样本关系存在差异，因此采用双流编码器方法来解决这个问题。此外，我们还提出了位置感知因子化自关注（LAFSA），将文本多边形控制点的顺序信息纳入建模，有效提高了视觉特征方面标签阅读顺序的准确性。在多个数据集上的实验结果表明，与其他方法相比，ESRNet 表现出更优越的性能。值得注意的是，ESRNet在Total-Text、CTW1500和ArT数据集上的F-measure分别达到了88.9、88.4和77.4。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied Intelligence 工程技术-计算机：人工智能

CiteScore

6.60

自引率

20.80%

发文量

1361

审稿时长

5.9 months

期刊介绍： With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance. The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.