ADATS: Adaptive RoI-Align based Transformer for End-to-End Text Spotting

Zepeng Huang, Qi Wan, Junliang Chen, Xiaodong Zhao, Kai Ye, Linlin Shen
{"title":"ADATS: Adaptive RoI-Align based Transformer for End-to-End Text Spotting","authors":"Zepeng Huang, Qi Wan, Junliang Chen, Xiaodong Zhao, Kai Ye, Linlin Shen","doi":"10.1109/ICME55011.2023.00243","DOIUrl":null,"url":null,"abstract":"Scene text spotting has attracted great attention in recent years. Compared with two-stage approaches that locate scene texts in the first stage and recognize them in the second stage, the advantages of joint location and recognition training are not fully explored. In this paper, we present an ADaptive RoI-Align based transformer for end-to-end Text Spotting (ADATS), which simultaneously locates and recognizes text with a single forward pass. By employing an Adaptive RoI-Align, the text features are extracted from the feature extraction network with the original aspect ratio, such that less information is lost during the alignment of arbitrarily-shaped scene text. Attention-based segmentation and recognition heads allow us to simultaneously optimize detection and recognition. Experiments on ICDAR 2015, MSRA-TD500, Total-Text, and CTW1500 demonstrate the effectiveness of our method.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Multimedia and Expo (ICME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME55011.2023.00243","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Scene text spotting has attracted great attention in recent years. Compared with two-stage approaches that locate scene texts in the first stage and recognize them in the second stage, the advantages of joint location and recognition training are not fully explored. In this paper, we present an ADaptive RoI-Align based transformer for end-to-end Text Spotting (ADATS), which simultaneously locates and recognizes text with a single forward pass. By employing an Adaptive RoI-Align, the text features are extracted from the feature extraction network with the original aspect ratio, such that less information is lost during the alignment of arbitrarily-shaped scene text. Attention-based segmentation and recognition heads allow us to simultaneously optimize detection and recognition. Experiments on ICDAR 2015, MSRA-TD500, Total-Text, and CTW1500 demonstrate the effectiveness of our method.
ADATS:基于自适应RoI-Align的端到端文本定位转换器
场景文本识别是近年来备受关注的问题。与第一阶段定位场景文本,第二阶段识别场景文本的两阶段方法相比,联合定位和识别训练的优势没有得到充分挖掘。在本文中,我们提出了一种基于自适应RoI-Align的端到端文本定位(ADATS)变压器,它可以通过单个前向传递同时定位和识别文本。采用自适应RoI-Align方法,以原始宽高比从特征提取网络中提取文本特征,从而减少了对任意形状的场景文本进行对齐时丢失的信息。基于注意力的分割和识别头允许我们同时优化检测和识别。在ICDAR 2015、MSRA-TD500、Total-Text和CTW1500上的实验证明了该方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信