高棉打印字符识别使用基于注意力的Seq2Seq网络

Ho Chi Minh City Open University Journal of Science Engineering and Technology Pub Date : 2022-04-20 DOI:10.46223/hcmcoujs.tech.en.12.1.2217.2022

R. Buoy, Nguonly Taing, Sovisal Chenda, Sokchea Kor

{"title":"高棉打印字符识别使用基于注意力的Seq2Seq网络","authors":"R. Buoy, Nguonly Taing, Sovisal Chenda, Sokchea Kor","doi":"10.46223/hcmcoujs.tech.en.12.1.2217.2022","DOIUrl":null,"url":null,"abstract":"This paper presents an end-to-end deep convolutional recurrent neural network solution for Khmer optical character recognition (OCR) task. The proposed solution uses a sequence-to-sequence (Seq2Seq) architecture with attention mechanism. The encoder extracts visual features from an input text-line image via layers of convolutional blocks and a layer of gated recurrent units (GRU). The features are encoded in a single context vector and a sequence of hidden states which are fed to the decoder for decoding one character at a time until a special end-of-sentence (EOS) token is reached. The attention mechanism allows the decoder network to adaptively select relevant parts of the input image while predicting a target character. The Seq2Seq Khmer OCR network is trained on a large collection of computer-generated text-line images for multiple common Khmer fonts. Complex data augmentation is applied on both train and validation dataset. The proposed model’s performance outperforms the state-of-art Tesseract OCR engine for Khmer language on the validation set of 6400 augmented images by achieving a character error rate (CER) of 0.7% vs 35.9%.","PeriodicalId":34742,"journal":{"name":"Ho Chi Minh City Open University Journal of Science Engineering and Technology","volume":"150 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Khmer printed character recognition using attention-based Seq2Seq network\",\"authors\":\"R. Buoy, Nguonly Taing, Sovisal Chenda, Sokchea Kor\",\"doi\":\"10.46223/hcmcoujs.tech.en.12.1.2217.2022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents an end-to-end deep convolutional recurrent neural network solution for Khmer optical character recognition (OCR) task. The proposed solution uses a sequence-to-sequence (Seq2Seq) architecture with attention mechanism. The encoder extracts visual features from an input text-line image via layers of convolutional blocks and a layer of gated recurrent units (GRU). The features are encoded in a single context vector and a sequence of hidden states which are fed to the decoder for decoding one character at a time until a special end-of-sentence (EOS) token is reached. The attention mechanism allows the decoder network to adaptively select relevant parts of the input image while predicting a target character. The Seq2Seq Khmer OCR network is trained on a large collection of computer-generated text-line images for multiple common Khmer fonts. Complex data augmentation is applied on both train and validation dataset. The proposed model’s performance outperforms the state-of-art Tesseract OCR engine for Khmer language on the validation set of 6400 augmented images by achieving a character error rate (CER) of 0.7% vs 35.9%.\",\"PeriodicalId\":34742,\"journal\":{\"name\":\"Ho Chi Minh City Open University Journal of Science Engineering and Technology\",\"volume\":\"150 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ho Chi Minh City Open University Journal of Science Engineering and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.46223/hcmcoujs.tech.en.12.1.2217.2022\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ho Chi Minh City Open University Journal of Science Engineering and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46223/hcmcoujs.tech.en.12.1.2217.2022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

提出了一种端到端的深度卷积递归神经网络解决方案，用于高棉语光学字符识别(OCR)任务。提出的解决方案使用具有注意机制的序列到序列(Seq2Seq)体系结构。编码器通过多层卷积块和一层门控循环单元(GRU)从输入文本行图像中提取视觉特征。这些特征被编码为单个上下文向量和一系列隐藏状态，这些隐藏状态被馈送给解码器，以便每次解码一个字符，直到达到特殊的句子结束(EOS)令牌。注意机制允许解码器网络自适应地选择输入图像的相关部分，同时预测目标字符。Seq2Seq高棉OCR网络是在大量计算机生成的多种常见高棉字体的文本行图像上进行训练的。对训练数据集和验证数据集分别进行了复杂数据增强。在6400张增强图像的验证集上，该模型的性能优于目前最先进的高棉语Tesseract OCR引擎，字符错误率(CER)为0.7%，高于35.9%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Khmer printed character recognition using attention-based Seq2Seq network

This paper presents an end-to-end deep convolutional recurrent neural network solution for Khmer optical character recognition (OCR) task. The proposed solution uses a sequence-to-sequence (Seq2Seq) architecture with attention mechanism. The encoder extracts visual features from an input text-line image via layers of convolutional blocks and a layer of gated recurrent units (GRU). The features are encoded in a single context vector and a sequence of hidden states which are fed to the decoder for decoding one character at a time until a special end-of-sentence (EOS) token is reached. The attention mechanism allows the decoder network to adaptively select relevant parts of the input image while predicting a target character. The Seq2Seq Khmer OCR network is trained on a large collection of computer-generated text-line images for multiple common Khmer fonts. Complex data augmentation is applied on both train and validation dataset. The proposed model’s performance outperforms the state-of-art Tesseract OCR engine for Khmer language on the validation set of 6400 augmented images by achieving a character error rate (CER) of 0.7% vs 35.9%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Ho Chi Minh City Open University Journal of Science Engineering and Technology

自引率

0.00%

发文量

审稿时长

8 weeks