Zepeng Huang, Qi Wan, Junliang Chen, Xiaodong Zhao, Kai Ye, Linlin Shen
{"title":"ADATS: Adaptive RoI-Align based Transformer for End-to-End Text Spotting","authors":"Zepeng Huang, Qi Wan, Junliang Chen, Xiaodong Zhao, Kai Ye, Linlin Shen","doi":"10.1109/ICME55011.2023.00243","DOIUrl":null,"url":null,"abstract":"Scene text spotting has attracted great attention in recent years. Compared with two-stage approaches that locate scene texts in the first stage and recognize them in the second stage, the advantages of joint location and recognition training are not fully explored. In this paper, we present an ADaptive RoI-Align based transformer for end-to-end Text Spotting (ADATS), which simultaneously locates and recognizes text with a single forward pass. By employing an Adaptive RoI-Align, the text features are extracted from the feature extraction network with the original aspect ratio, such that less information is lost during the alignment of arbitrarily-shaped scene text. Attention-based segmentation and recognition heads allow us to simultaneously optimize detection and recognition. Experiments on ICDAR 2015, MSRA-TD500, Total-Text, and CTW1500 demonstrate the effectiveness of our method.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Multimedia and Expo (ICME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME55011.2023.00243","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Scene text spotting has attracted great attention in recent years. Compared with two-stage approaches that locate scene texts in the first stage and recognize them in the second stage, the advantages of joint location and recognition training are not fully explored. In this paper, we present an ADaptive RoI-Align based transformer for end-to-end Text Spotting (ADATS), which simultaneously locates and recognizes text with a single forward pass. By employing an Adaptive RoI-Align, the text features are extracted from the feature extraction network with the original aspect ratio, such that less information is lost during the alignment of arbitrarily-shaped scene text. Attention-based segmentation and recognition heads allow us to simultaneously optimize detection and recognition. Experiments on ICDAR 2015, MSRA-TD500, Total-Text, and CTW1500 demonstrate the effectiveness of our method.