文本识别中变形的综合研究：技术、挑战和未来方向

IF 28 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

ACM Computing Surveys Pub Date : 2025-10-09 DOI:10.1145/3771273

Ali Afkari-Fahandari, Elham Shabaninia, Fatemeh Asadi-Zeydabadi, Hossein Nezamabadi-Pour

{"title":"文本识别中变形的综合研究：技术、挑战和未来方向","authors":"Ali Afkari-Fahandari, Elham Shabaninia, Fatemeh Asadi-Zeydabadi, Hossein Nezamabadi-Pour","doi":"10.1145/3771273","DOIUrl":null,"url":null,"abstract":"Optical character recognition is a rapidly evolving field within pattern recognition, enabling the automatic conversion of printed or handwritten text images into machine-readable formats. This technology plays a critical role across various sectors, including banking, healthcare, government, and education. While Optical character recognition systems encompass multiple stages such as text detection, segmentation, and post-processing, this paper focuses on text recognition as a core and technically challenging component. In particular, we provide an in-depth review of recent advances driven by Transformer-based models, which have significantly pushed the state-of-the-art. To contextualize these advancements, a detailed comparative analysis of Transformer-based techniques is presented against earlier deep learning approaches, highlighting their respective limitations and the improvements introduced by Transformers, including parallel sequence processing, global context modeling, better handling of long-range dependencies, and enhanced robustness to irregular or noisy text layouts. We also examine widely used benchmark datasets in the literature and provide a detailed discussion of the performance achieved by recent state-of-the-art methods. Finally, this survey outlines open research challenges and potential future directions. It aims to serve as a comprehensive reference for both novice and experienced researchers by summarizing the latest developments in text recognition, including architectures, datasets, evaluation metrics, and practical considerations in model performance trade-offs and deployment.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"11 1","pages":""},"PeriodicalIF":28.0000,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Comprehensive Survey of Transformers in Text Recognition: Techniques, Challenges, and Future Directions\",\"authors\":\"Ali Afkari-Fahandari, Elham Shabaninia, Fatemeh Asadi-Zeydabadi, Hossein Nezamabadi-Pour\",\"doi\":\"10.1145/3771273\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Optical character recognition is a rapidly evolving field within pattern recognition, enabling the automatic conversion of printed or handwritten text images into machine-readable formats. This technology plays a critical role across various sectors, including banking, healthcare, government, and education. While Optical character recognition systems encompass multiple stages such as text detection, segmentation, and post-processing, this paper focuses on text recognition as a core and technically challenging component. In particular, we provide an in-depth review of recent advances driven by Transformer-based models, which have significantly pushed the state-of-the-art. To contextualize these advancements, a detailed comparative analysis of Transformer-based techniques is presented against earlier deep learning approaches, highlighting their respective limitations and the improvements introduced by Transformers, including parallel sequence processing, global context modeling, better handling of long-range dependencies, and enhanced robustness to irregular or noisy text layouts. We also examine widely used benchmark datasets in the literature and provide a detailed discussion of the performance achieved by recent state-of-the-art methods. Finally, this survey outlines open research challenges and potential future directions. It aims to serve as a comprehensive reference for both novice and experienced researchers by summarizing the latest developments in text recognition, including architectures, datasets, evaluation metrics, and practical considerations in model performance trade-offs and deployment.\",\"PeriodicalId\":50926,\"journal\":{\"name\":\"ACM Computing Surveys\",\"volume\":\"11 1\",\"pages\":\"\"},\"PeriodicalIF\":28.0000,\"publicationDate\":\"2025-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Computing Surveys\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3771273\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Computing Surveys","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3771273","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

光学字符识别是模式识别中的一个快速发展的领域，它能够将打印或手写文本图像自动转换为机器可读的格式。这项技术在银行、医疗保健、政府和教育等各个部门发挥着关键作用。光学字符识别系统包括文本检测、分割和后处理等多个阶段，而本文的重点是文本识别作为其核心和技术上具有挑战性的组成部分。特别是，我们提供了一个深入的审查，最近的进展驱动的变压器为基础的模型，这大大推动了国家的最先进的。为了将这些进步置于背景中，本文对基于transformer的技术与早期深度学习方法进行了详细的比较分析，突出了它们各自的局限性和transformer引入的改进，包括并行序列处理、全局上下文建模、更好地处理远程依赖关系，以及增强对不规则或嘈杂文本布局的鲁棒性。我们还研究了文献中广泛使用的基准数据集，并详细讨论了最近最先进的方法所实现的性能。最后，本调查概述了开放的研究挑战和潜在的未来方向。它旨在通过总结文本识别的最新发展，包括架构、数据集、评估指标以及模型性能权衡和部署中的实际考虑，为新手和有经验的研究人员提供全面的参考。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Comprehensive Survey of Transformers in Text Recognition: Techniques, Challenges, and Future Directions

Optical character recognition is a rapidly evolving field within pattern recognition, enabling the automatic conversion of printed or handwritten text images into machine-readable formats. This technology plays a critical role across various sectors, including banking, healthcare, government, and education. While Optical character recognition systems encompass multiple stages such as text detection, segmentation, and post-processing, this paper focuses on text recognition as a core and technically challenging component. In particular, we provide an in-depth review of recent advances driven by Transformer-based models, which have significantly pushed the state-of-the-art. To contextualize these advancements, a detailed comparative analysis of Transformer-based techniques is presented against earlier deep learning approaches, highlighting their respective limitations and the improvements introduced by Transformers, including parallel sequence processing, global context modeling, better handling of long-range dependencies, and enhanced robustness to irregular or noisy text layouts. We also examine widely used benchmark datasets in the literature and provide a detailed discussion of the performance achieved by recent state-of-the-art methods. Finally, this survey outlines open research challenges and potential future directions. It aims to serve as a comprehensive reference for both novice and experienced researchers by summarizing the latest developments in text recognition, including architectures, datasets, evaluation metrics, and practical considerations in model performance trade-offs and deployment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Computing Surveys 工程技术-计算机：理论方法

CiteScore

33.20

自引率

0.60%

发文量

372

审稿时长

12 months

期刊介绍： ACM Computing Surveys is an academic journal that focuses on publishing surveys and tutorials on various areas of computing research and practice. The journal aims to provide comprehensive and easily understandable articles that guide readers through the literature and help them understand topics outside their specialties. In terms of impact, CSUR has a high reputation with a 2022 Impact Factor of 16.6. It is ranked 3rd out of 111 journals in the field of Computer Science Theory & Methods. ACM Computing Surveys is indexed and abstracted in various services, including AI2 Semantic Scholar, Baidu, Clarivate/ISI: JCR, CNKI, DeepDyve, DTU, EBSCO: EDS/HOST, and IET Inspec, among others.