高棉打印字符识别使用基于注意力的Seq2Seq网络

R. Buoy, Nguonly Taing, Sovisal Chenda, Sokchea Kor
{"title":"高棉打印字符识别使用基于注意力的Seq2Seq网络","authors":"R. Buoy, Nguonly Taing, Sovisal Chenda, Sokchea Kor","doi":"10.46223/hcmcoujs.tech.en.12.1.2217.2022","DOIUrl":null,"url":null,"abstract":"This paper presents an end-to-end deep convolutional recurrent neural network solution for Khmer optical character recognition (OCR) task. The proposed solution uses a sequence-to-sequence (Seq2Seq) architecture with attention mechanism. The encoder extracts visual features from an input text-line image via layers of convolutional blocks and a layer of gated recurrent units (GRU). The features are encoded in a single context vector and a sequence of hidden states which are fed to the decoder for decoding one character at a time until a special end-of-sentence (EOS) token is reached. The attention mechanism allows the decoder network to adaptively select relevant parts of the input image while predicting a target character. The Seq2Seq Khmer OCR network is trained on a large collection of computer-generated text-line images for multiple common Khmer fonts. Complex data augmentation is applied on both train and validation dataset. The proposed model’s performance outperforms the state-of-art Tesseract OCR engine for Khmer language on the validation set of 6400 augmented images by achieving a character error rate (CER) of 0.7% vs 35.9%.","PeriodicalId":34742,"journal":{"name":"Ho Chi Minh City Open University Journal of Science Engineering and Technology","volume":"150 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Khmer printed character recognition using attention-based Seq2Seq network\",\"authors\":\"R. Buoy, Nguonly Taing, Sovisal Chenda, Sokchea Kor\",\"doi\":\"10.46223/hcmcoujs.tech.en.12.1.2217.2022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents an end-to-end deep convolutional recurrent neural network solution for Khmer optical character recognition (OCR) task. The proposed solution uses a sequence-to-sequence (Seq2Seq) architecture with attention mechanism. The encoder extracts visual features from an input text-line image via layers of convolutional blocks and a layer of gated recurrent units (GRU). The features are encoded in a single context vector and a sequence of hidden states which are fed to the decoder for decoding one character at a time until a special end-of-sentence (EOS) token is reached. The attention mechanism allows the decoder network to adaptively select relevant parts of the input image while predicting a target character. The Seq2Seq Khmer OCR network is trained on a large collection of computer-generated text-line images for multiple common Khmer fonts. Complex data augmentation is applied on both train and validation dataset. The proposed model’s performance outperforms the state-of-art Tesseract OCR engine for Khmer language on the validation set of 6400 augmented images by achieving a character error rate (CER) of 0.7% vs 35.9%.\",\"PeriodicalId\":34742,\"journal\":{\"name\":\"Ho Chi Minh City Open University Journal of Science Engineering and Technology\",\"volume\":\"150 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ho Chi Minh City Open University Journal of Science Engineering and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.46223/hcmcoujs.tech.en.12.1.2217.2022\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ho Chi Minh City Open University Journal of Science Engineering and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46223/hcmcoujs.tech.en.12.1.2217.2022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

提出了一种端到端的深度卷积递归神经网络解决方案,用于高棉语光学字符识别(OCR)任务。提出的解决方案使用具有注意机制的序列到序列(Seq2Seq)体系结构。编码器通过多层卷积块和一层门控循环单元(GRU)从输入文本行图像中提取视觉特征。这些特征被编码为单个上下文向量和一系列隐藏状态,这些隐藏状态被馈送给解码器,以便每次解码一个字符,直到达到特殊的句子结束(EOS)令牌。注意机制允许解码器网络自适应地选择输入图像的相关部分,同时预测目标字符。Seq2Seq高棉OCR网络是在大量计算机生成的多种常见高棉字体的文本行图像上进行训练的。对训练数据集和验证数据集分别进行了复杂数据增强。在6400张增强图像的验证集上,该模型的性能优于目前最先进的高棉语Tesseract OCR引擎,字符错误率(CER)为0.7%,高于35.9%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Khmer printed character recognition using attention-based Seq2Seq network
This paper presents an end-to-end deep convolutional recurrent neural network solution for Khmer optical character recognition (OCR) task. The proposed solution uses a sequence-to-sequence (Seq2Seq) architecture with attention mechanism. The encoder extracts visual features from an input text-line image via layers of convolutional blocks and a layer of gated recurrent units (GRU). The features are encoded in a single context vector and a sequence of hidden states which are fed to the decoder for decoding one character at a time until a special end-of-sentence (EOS) token is reached. The attention mechanism allows the decoder network to adaptively select relevant parts of the input image while predicting a target character. The Seq2Seq Khmer OCR network is trained on a large collection of computer-generated text-line images for multiple common Khmer fonts. Complex data augmentation is applied on both train and validation dataset. The proposed model’s performance outperforms the state-of-art Tesseract OCR engine for Khmer language on the validation set of 6400 augmented images by achieving a character error rate (CER) of 0.7% vs 35.9%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
6
审稿时长
8 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信