Hui Chen , Runming Jiang , Fang Hu , Min Chen , Yin Zhang
{"title":"LFEN: A language feature enhanced network for scene text recognition","authors":"Hui Chen , Runming Jiang , Fang Hu , Min Chen , Yin Zhang","doi":"10.1016/j.cogr.2025.08.001","DOIUrl":null,"url":null,"abstract":"<div><div>In the context of natural scenes, traditional text recognition methods exhibit limitations when confronted with the substantial differences in characters and context among diverse languages. To address this challenge, we propose an approach LFEN for text recognition and correction in natural scenes. By directly embedding language features into the text recognition model, we effectively address the issue of accuracy in scene text recognition, reducing the potential risk of error accumulation compared to traditional language recognition-text recognition serial connections. Through a detailed analysis of global and local language features, this paper successfully achieves more accurate differentiation between languages with similar characters, thereby enhancing text recognition accuracy. Furthermore, by incorporating the intrinsic semantic relationships of text content, this paper employs a sequence-to-sequence (Seq2Seq) model based on convolutional neural networks for text correction. Through the integration of language information, different feature embeddings, and global residual connections, the paper provides a robust solution for text correction in scene text recognition. Compared to the baselines, the experimental results demonstrate that LFEN achieves superior performance in most evaluation metrics. Specifically, LFEN has around 2% in recall improved to BERT. This research contributes substantial support to the advancement of natural scene text recognition and correction.</div></div>","PeriodicalId":100288,"journal":{"name":"Cognitive Robotics","volume":"5 ","pages":"Pages 276-285"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Robotics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667241325000199","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In the context of natural scenes, traditional text recognition methods exhibit limitations when confronted with the substantial differences in characters and context among diverse languages. To address this challenge, we propose an approach LFEN for text recognition and correction in natural scenes. By directly embedding language features into the text recognition model, we effectively address the issue of accuracy in scene text recognition, reducing the potential risk of error accumulation compared to traditional language recognition-text recognition serial connections. Through a detailed analysis of global and local language features, this paper successfully achieves more accurate differentiation between languages with similar characters, thereby enhancing text recognition accuracy. Furthermore, by incorporating the intrinsic semantic relationships of text content, this paper employs a sequence-to-sequence (Seq2Seq) model based on convolutional neural networks for text correction. Through the integration of language information, different feature embeddings, and global residual connections, the paper provides a robust solution for text correction in scene text recognition. Compared to the baselines, the experimental results demonstrate that LFEN achieves superior performance in most evaluation metrics. Specifically, LFEN has around 2% in recall improved to BERT. This research contributes substantial support to the advancement of natural scene text recognition and correction.