Gege Zhang , Zhiyong Gan , Ling Deng , Shuaicheng Niu , Zhenghua Peng , Gang Dai , Shuangping Huang , Xiangmin Xu
{"title":"基于文本到多边形生成器的文本识别弱监督学习框架","authors":"Gege Zhang , Zhiyong Gan , Ling Deng , Shuaicheng Niu , Zhenghua Peng , Gang Dai , Shuangping Huang , Xiangmin Xu","doi":"10.1016/j.patcog.2025.112408","DOIUrl":null,"url":null,"abstract":"<div><div>Advanced text spotting methods typically rely on large-scale, meticulously labeled datasets to achieve satisfactory performance. However, annotating fine-grained positional information of texts in real-world scene images is extremely costly and time-consuming. Although some weakly supervised methods have been developed to reduce annotation costs, they face two major challenges: 1) their performance significantly lags behind the fully supervised counterparts, and 2) They are tightly coupled with specific text spotting models, meaning that switching to a different model would require retraining and incur substantial computational costs. To address these limitations, we propose a novel text-only weakly supervised learning framework for text spotting via text-to-polygon generator. In the first stage, we pretrain a text-to-polygon generator on an auxiliary dataset, e.g., synthetic or public datasets, where full annotations are readily accessible. In the second stage, given real-world target datasets annotated with text-only labels, we employ the pretrained generator to produce pseudo polygon labels, thereby constructing a pseudo-labeled supervised dataset for training text spotting models. To ensure high-quality pseudo polygon labels, the text-to-polygon generator first identifies all candidate text regions, then filters those that are relevant to the target text, and finally predicts their precise spatial locations. Notably, this generator requires only a single pretraining session and can subsequently be applied to any text spotting model and target text-only dataset without incurring additional costs. Extensive experiments on public benchmarks demonstrate that our method can significantly reduce labeling costs while maintaining competitive performance.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112408"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A text-only weakly supervised learning framework for text spotting via text-to-polygon generator\",\"authors\":\"Gege Zhang , Zhiyong Gan , Ling Deng , Shuaicheng Niu , Zhenghua Peng , Gang Dai , Shuangping Huang , Xiangmin Xu\",\"doi\":\"10.1016/j.patcog.2025.112408\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Advanced text spotting methods typically rely on large-scale, meticulously labeled datasets to achieve satisfactory performance. However, annotating fine-grained positional information of texts in real-world scene images is extremely costly and time-consuming. Although some weakly supervised methods have been developed to reduce annotation costs, they face two major challenges: 1) their performance significantly lags behind the fully supervised counterparts, and 2) They are tightly coupled with specific text spotting models, meaning that switching to a different model would require retraining and incur substantial computational costs. To address these limitations, we propose a novel text-only weakly supervised learning framework for text spotting via text-to-polygon generator. In the first stage, we pretrain a text-to-polygon generator on an auxiliary dataset, e.g., synthetic or public datasets, where full annotations are readily accessible. In the second stage, given real-world target datasets annotated with text-only labels, we employ the pretrained generator to produce pseudo polygon labels, thereby constructing a pseudo-labeled supervised dataset for training text spotting models. To ensure high-quality pseudo polygon labels, the text-to-polygon generator first identifies all candidate text regions, then filters those that are relevant to the target text, and finally predicts their precise spatial locations. Notably, this generator requires only a single pretraining session and can subsequently be applied to any text spotting model and target text-only dataset without incurring additional costs. Extensive experiments on public benchmarks demonstrate that our method can significantly reduce labeling costs while maintaining competitive performance.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"172 \",\"pages\":\"Article 112408\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-09-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320325010696\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325010696","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A text-only weakly supervised learning framework for text spotting via text-to-polygon generator
Advanced text spotting methods typically rely on large-scale, meticulously labeled datasets to achieve satisfactory performance. However, annotating fine-grained positional information of texts in real-world scene images is extremely costly and time-consuming. Although some weakly supervised methods have been developed to reduce annotation costs, they face two major challenges: 1) their performance significantly lags behind the fully supervised counterparts, and 2) They are tightly coupled with specific text spotting models, meaning that switching to a different model would require retraining and incur substantial computational costs. To address these limitations, we propose a novel text-only weakly supervised learning framework for text spotting via text-to-polygon generator. In the first stage, we pretrain a text-to-polygon generator on an auxiliary dataset, e.g., synthetic or public datasets, where full annotations are readily accessible. In the second stage, given real-world target datasets annotated with text-only labels, we employ the pretrained generator to produce pseudo polygon labels, thereby constructing a pseudo-labeled supervised dataset for training text spotting models. To ensure high-quality pseudo polygon labels, the text-to-polygon generator first identifies all candidate text regions, then filters those that are relevant to the target text, and finally predicts their precise spatial locations. Notably, this generator requires only a single pretraining session and can subsequently be applied to any text spotting model and target text-only dataset without incurring additional costs. Extensive experiments on public benchmarks demonstrate that our method can significantly reduce labeling costs while maintaining competitive performance.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.