LFEN: A language feature enhanced network for scene text recognition

Cognitive Robotics Pub Date : 2025-01-01 DOI:10.1016/j.cogr.2025.08.001

Hui Chen , Runming Jiang , Fang Hu , Min Chen , Yin Zhang

{"title":"LFEN: A language feature enhanced network for scene text recognition","authors":"Hui Chen , Runming Jiang , Fang Hu , Min Chen , Yin Zhang","doi":"10.1016/j.cogr.2025.08.001","DOIUrl":null,"url":null,"abstract":"<div><div>In the context of natural scenes, traditional text recognition methods exhibit limitations when confronted with the substantial differences in characters and context among diverse languages. To address this challenge, we propose an approach LFEN for text recognition and correction in natural scenes. By directly embedding language features into the text recognition model, we effectively address the issue of accuracy in scene text recognition, reducing the potential risk of error accumulation compared to traditional language recognition-text recognition serial connections. Through a detailed analysis of global and local language features, this paper successfully achieves more accurate differentiation between languages with similar characters, thereby enhancing text recognition accuracy. Furthermore, by incorporating the intrinsic semantic relationships of text content, this paper employs a sequence-to-sequence (Seq2Seq) model based on convolutional neural networks for text correction. Through the integration of language information, different feature embeddings, and global residual connections, the paper provides a robust solution for text correction in scene text recognition. Compared to the baselines, the experimental results demonstrate that LFEN achieves superior performance in most evaluation metrics. Specifically, LFEN has around 2% in recall improved to BERT. This research contributes substantial support to the advancement of natural scene text recognition and correction.</div></div>","PeriodicalId":100288,"journal":{"name":"Cognitive Robotics","volume":"5 ","pages":"Pages 276-285"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Robotics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667241325000199","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In the context of natural scenes, traditional text recognition methods exhibit limitations when confronted with the substantial differences in characters and context among diverse languages. To address this challenge, we propose an approach LFEN for text recognition and correction in natural scenes. By directly embedding language features into the text recognition model, we effectively address the issue of accuracy in scene text recognition, reducing the potential risk of error accumulation compared to traditional language recognition-text recognition serial connections. Through a detailed analysis of global and local language features, this paper successfully achieves more accurate differentiation between languages with similar characters, thereby enhancing text recognition accuracy. Furthermore, by incorporating the intrinsic semantic relationships of text content, this paper employs a sequence-to-sequence (Seq2Seq) model based on convolutional neural networks for text correction. Through the integration of language information, different feature embeddings, and global residual connections, the paper provides a robust solution for text correction in scene text recognition. Compared to the baselines, the experimental results demonstrate that LFEN achieves superior performance in most evaluation metrics. Specifically, LFEN has around 2% in recall improved to BERT. This research contributes substantial support to the advancement of natural scene text recognition and correction.

查看原文本刊更多论文

LFEN：用于场景文本识别的语言特征增强网络

在自然场景语境下，面对不同语言之间字符和语境的巨大差异，传统的文本识别方法表现出局限性。为了解决这一挑战，我们提出了一种用于自然场景文本识别和校正的LFEN方法。通过将语言特征直接嵌入到文本识别模型中，我们有效地解决了场景文本识别的准确性问题，与传统的语言识别-文本识别串行连接相比，减少了潜在的错误积累风险。通过对全局语言和局部语言特征的详细分析，本文成功地实现了对具有相似字符的语言更准确的区分，从而提高了文本识别的准确率。此外，通过结合文本内容的内在语义关系，本文采用基于卷积神经网络的序列到序列（Seq2Seq）模型进行文本校正。本文通过整合语言信息、不同特征嵌入和全局残差连接，为场景文本识别中的文本校正提供了鲁棒性解决方案。与基线相比，实验结果表明LFEN在大多数评估指标上都取得了优异的性能。具体来说，LFEN的召回率提高到了BERT的2%左右。本研究为自然场景文本识别与校正的发展提供了有力的支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Cognitive Robotics

CiteScore

8.40

自引率

0.00%

发文量