{"title":"Batch-transformer for scene text image super-resolution","authors":"Yaqi Sun, Xiaolan Xie, Zhi Li, Kai Yang","doi":"10.1007/s00371-024-03598-7","DOIUrl":null,"url":null,"abstract":"<p>Recognizing low-resolution text images is challenging as they often lose their detailed information, leading to poor recognition accuracy. Moreover, the traditional methods, based on deep convolutional neural networks (CNNs), are not effective enough for some low-resolution text images with dense characters. In this paper, a novel CNN-based batch-transformer network for scene text image super-resolution (BT-STISR) method is proposed to address this problem. In order to obtain the text information for text reconstruction, a pre-trained text prior module is employed to extract text information. Then a novel two pipeline batch-transformer-based module is proposed, leveraging self-attention and global attention mechanisms to exert the guidance of text prior to the text reconstruction process. Experimental study on a benchmark dataset TextZoom shows that the proposed method BT-STISR achieves the best state-of-the-art performance in terms of structural similarity (SSIM) and peak signal-to-noise ratio (PSNR) metrics compared to some latest methods.\n</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"7 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03598-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recognizing low-resolution text images is challenging as they often lose their detailed information, leading to poor recognition accuracy. Moreover, the traditional methods, based on deep convolutional neural networks (CNNs), are not effective enough for some low-resolution text images with dense characters. In this paper, a novel CNN-based batch-transformer network for scene text image super-resolution (BT-STISR) method is proposed to address this problem. In order to obtain the text information for text reconstruction, a pre-trained text prior module is employed to extract text information. Then a novel two pipeline batch-transformer-based module is proposed, leveraging self-attention and global attention mechanisms to exert the guidance of text prior to the text reconstruction process. Experimental study on a benchmark dataset TextZoom shows that the proposed method BT-STISR achieves the best state-of-the-art performance in terms of structural similarity (SSIM) and peak signal-to-noise ratio (PSNR) metrics compared to some latest methods.