{"title":"EGO-LM: An efficient, generic, and out-of-the-box language model for handwritten text recognition","authors":"Hongliang Li , Dezhi Peng , Lianwen Jin","doi":"10.1016/j.patcog.2024.111130","DOIUrl":null,"url":null,"abstract":"<div><div>The language model (LM) plays a crucial role in post-processing handwritten text recognition (HTR) by capturing linguistic patterns. However, traditional rule-based LMs are inefficient, and recent end-to-end LMs require customized training for each HTR model. To address these limitations, we propose an <strong>E</strong>fficient, <strong>G</strong>eneric, and <strong>O</strong>ut-of-the-box <strong>L</strong>anguage <strong>M</strong>odel (EGO-LM) for HTR. To unlock the out-of-the-box capability of the end-to-end LM, we introduce a vision-limited proxy task that focuses on visual-pattern-agnostic linguistic dependencies during training, enhancing the robustness and generality of the LM. The enhanced capabilities also enable EGO-LM to iteratively refine its output for a further accuracy boost without additional tuning. Moreover, we introduce a <strong>D</strong>iverse-<strong>C</strong>orpus <strong>O</strong>nline <strong>H</strong>andwriting dataset (DCOH-120K) with more diverse corpus types and more samples than existing datasets, including 83,142 Chinese and 39,398 English text lines. Extensive experiments demonstrate that EGO-LM can attain state-of-the-art performance while achieving up to 613<span><math><mo>×</mo></math></span> acceleration. The DCOH-120K dataset is available at .</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111130"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320324008811","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The language model (LM) plays a crucial role in post-processing handwritten text recognition (HTR) by capturing linguistic patterns. However, traditional rule-based LMs are inefficient, and recent end-to-end LMs require customized training for each HTR model. To address these limitations, we propose an Efficient, Generic, and Out-of-the-box Language Model (EGO-LM) for HTR. To unlock the out-of-the-box capability of the end-to-end LM, we introduce a vision-limited proxy task that focuses on visual-pattern-agnostic linguistic dependencies during training, enhancing the robustness and generality of the LM. The enhanced capabilities also enable EGO-LM to iteratively refine its output for a further accuracy boost without additional tuning. Moreover, we introduce a Diverse-Corpus Online Handwriting dataset (DCOH-120K) with more diverse corpus types and more samples than existing datasets, including 83,142 Chinese and 39,398 English text lines. Extensive experiments demonstrate that EGO-LM can attain state-of-the-art performance while achieving up to 613 acceleration. The DCOH-120K dataset is available at .
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.