Ali Afkari-Fahandari, Elham Shabaninia, Fatemeh Asadi-Zeydabadi, Hossein Nezamabadi-Pour
{"title":"文本识别中变形的综合研究:技术、挑战和未来方向","authors":"Ali Afkari-Fahandari, Elham Shabaninia, Fatemeh Asadi-Zeydabadi, Hossein Nezamabadi-Pour","doi":"10.1145/3771273","DOIUrl":null,"url":null,"abstract":"Optical character recognition is a rapidly evolving field within pattern recognition, enabling the automatic conversion of printed or handwritten text images into machine-readable formats. This technology plays a critical role across various sectors, including banking, healthcare, government, and education. While Optical character recognition systems encompass multiple stages such as text detection, segmentation, and post-processing, this paper focuses on text recognition as a core and technically challenging component. In particular, we provide an in-depth review of recent advances driven by Transformer-based models, which have significantly pushed the state-of-the-art. To contextualize these advancements, a detailed comparative analysis of Transformer-based techniques is presented against earlier deep learning approaches, highlighting their respective limitations and the improvements introduced by Transformers, including parallel sequence processing, global context modeling, better handling of long-range dependencies, and enhanced robustness to irregular or noisy text layouts. We also examine widely used benchmark datasets in the literature and provide a detailed discussion of the performance achieved by recent state-of-the-art methods. Finally, this survey outlines open research challenges and potential future directions. It aims to serve as a comprehensive reference for both novice and experienced researchers by summarizing the latest developments in text recognition, including architectures, datasets, evaluation metrics, and practical considerations in model performance trade-offs and deployment.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"11 1","pages":""},"PeriodicalIF":28.0000,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Comprehensive Survey of Transformers in Text Recognition: Techniques, Challenges, and Future Directions\",\"authors\":\"Ali Afkari-Fahandari, Elham Shabaninia, Fatemeh Asadi-Zeydabadi, Hossein Nezamabadi-Pour\",\"doi\":\"10.1145/3771273\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Optical character recognition is a rapidly evolving field within pattern recognition, enabling the automatic conversion of printed or handwritten text images into machine-readable formats. This technology plays a critical role across various sectors, including banking, healthcare, government, and education. While Optical character recognition systems encompass multiple stages such as text detection, segmentation, and post-processing, this paper focuses on text recognition as a core and technically challenging component. In particular, we provide an in-depth review of recent advances driven by Transformer-based models, which have significantly pushed the state-of-the-art. To contextualize these advancements, a detailed comparative analysis of Transformer-based techniques is presented against earlier deep learning approaches, highlighting their respective limitations and the improvements introduced by Transformers, including parallel sequence processing, global context modeling, better handling of long-range dependencies, and enhanced robustness to irregular or noisy text layouts. We also examine widely used benchmark datasets in the literature and provide a detailed discussion of the performance achieved by recent state-of-the-art methods. Finally, this survey outlines open research challenges and potential future directions. It aims to serve as a comprehensive reference for both novice and experienced researchers by summarizing the latest developments in text recognition, including architectures, datasets, evaluation metrics, and practical considerations in model performance trade-offs and deployment.\",\"PeriodicalId\":50926,\"journal\":{\"name\":\"ACM Computing Surveys\",\"volume\":\"11 1\",\"pages\":\"\"},\"PeriodicalIF\":28.0000,\"publicationDate\":\"2025-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Computing Surveys\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3771273\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Computing Surveys","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3771273","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
A Comprehensive Survey of Transformers in Text Recognition: Techniques, Challenges, and Future Directions
Optical character recognition is a rapidly evolving field within pattern recognition, enabling the automatic conversion of printed or handwritten text images into machine-readable formats. This technology plays a critical role across various sectors, including banking, healthcare, government, and education. While Optical character recognition systems encompass multiple stages such as text detection, segmentation, and post-processing, this paper focuses on text recognition as a core and technically challenging component. In particular, we provide an in-depth review of recent advances driven by Transformer-based models, which have significantly pushed the state-of-the-art. To contextualize these advancements, a detailed comparative analysis of Transformer-based techniques is presented against earlier deep learning approaches, highlighting their respective limitations and the improvements introduced by Transformers, including parallel sequence processing, global context modeling, better handling of long-range dependencies, and enhanced robustness to irregular or noisy text layouts. We also examine widely used benchmark datasets in the literature and provide a detailed discussion of the performance achieved by recent state-of-the-art methods. Finally, this survey outlines open research challenges and potential future directions. It aims to serve as a comprehensive reference for both novice and experienced researchers by summarizing the latest developments in text recognition, including architectures, datasets, evaluation metrics, and practical considerations in model performance trade-offs and deployment.
期刊介绍:
ACM Computing Surveys is an academic journal that focuses on publishing surveys and tutorials on various areas of computing research and practice. The journal aims to provide comprehensive and easily understandable articles that guide readers through the literature and help them understand topics outside their specialties. In terms of impact, CSUR has a high reputation with a 2022 Impact Factor of 16.6. It is ranked 3rd out of 111 journals in the field of Computer Science Theory & Methods.
ACM Computing Surveys is indexed and abstracted in various services, including AI2 Semantic Scholar, Baidu, Clarivate/ISI: JCR, CNKI, DeepDyve, DTU, EBSCO: EDS/HOST, and IET Inspec, among others.