Ancient Sanskrit Line-level OCR using OpenNMT Architecture

Ronak Shah, M. Gupta, Ajai Kumar
{"title":"Ancient Sanskrit Line-level OCR using OpenNMT Architecture","authors":"Ronak Shah, M. Gupta, Ajai Kumar","doi":"10.1109/ICIIP53038.2021.9702666","DOIUrl":null,"url":null,"abstract":"There have been several Optical Character Recognition (OCR) related works happened for Indian languages like Hindi, Marathi, Bangla, etc. But there is very little OCR-related work done for the Sanskrit language of Devanagari script. Sanskrit is a very complex language. The large word length and old degraded documents add more challenges to Sanskrit OCR research. Due to these challenges, the word accuracy of available OCR systems is not very high for such documents. Most of the work happened to recognize Sanskrit character recognition only. There is only one attempt to recognize the whole Sanskrit line for 10 fonts.This paper shows the study of different hyperparameters of OpenNMT architecture for Sanskrit OCR of synthetically generated color line images. A neural encoder-decoder model with attention is presented to converting line images into editable text. An attention-based approach can tackle this problem in a better way in comparison to other neural techniques using CTCbased models. The main aim of this paper is to give a detailed analysis of data preparation and various hyperparameters (like the number of LSTM layers, LSTM direction, size of character embedding vector, batch size, number of iteration, and hidden unit size) of encoder-decoder in OpenNMT, and accuracy of various combinations. This paper also concludes the best accuracy model for Sanskrit OCR using OpenNMT. The text recognition performance of the proposed method on the test set is achieved 99.44%. Our major contribution is to show text recognization of degraded line images with a variety of fonts using OpenNMT architecture. Our contribution helps the researcher community in deciding hyperparameters of encoder-decoder architecture for Sanskrit language OCR.","PeriodicalId":431272,"journal":{"name":"2021 Sixth International Conference on Image Information Processing (ICIIP)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Sixth International Conference on Image Information Processing (ICIIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIIP53038.2021.9702666","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

There have been several Optical Character Recognition (OCR) related works happened for Indian languages like Hindi, Marathi, Bangla, etc. But there is very little OCR-related work done for the Sanskrit language of Devanagari script. Sanskrit is a very complex language. The large word length and old degraded documents add more challenges to Sanskrit OCR research. Due to these challenges, the word accuracy of available OCR systems is not very high for such documents. Most of the work happened to recognize Sanskrit character recognition only. There is only one attempt to recognize the whole Sanskrit line for 10 fonts.This paper shows the study of different hyperparameters of OpenNMT architecture for Sanskrit OCR of synthetically generated color line images. A neural encoder-decoder model with attention is presented to converting line images into editable text. An attention-based approach can tackle this problem in a better way in comparison to other neural techniques using CTCbased models. The main aim of this paper is to give a detailed analysis of data preparation and various hyperparameters (like the number of LSTM layers, LSTM direction, size of character embedding vector, batch size, number of iteration, and hidden unit size) of encoder-decoder in OpenNMT, and accuracy of various combinations. This paper also concludes the best accuracy model for Sanskrit OCR using OpenNMT. The text recognition performance of the proposed method on the test set is achieved 99.44%. Our major contribution is to show text recognization of degraded line images with a variety of fonts using OpenNMT architecture. Our contribution helps the researcher community in deciding hyperparameters of encoder-decoder architecture for Sanskrit language OCR.
使用OpenNMT架构的古梵文行级OCR
有几个光学字符识别(OCR)相关的工作发生在印度语言,如印地语,马拉地语,孟加拉语等。但是对于梵语的Devanagari文字,很少有与ocr相关的工作。梵语是一种非常复杂的语言。较大的字长和旧的退化文档给梵文OCR研究增加了更多的挑战。由于这些挑战,可用的OCR系统对这类文档的单词精度不是很高。大部分工作碰巧只识别梵文字符识别。只有一次尝试来识别10种字体的整个梵文行。本文研究了OpenNMT体系结构的不同超参数对合成彩色线图像的梵文OCR的影响。提出了一种带注意力的神经编码器-解码器模型,用于将行图像转换为可编辑的文本。与使用基于ctc的模型的其他神经技术相比,基于注意力的方法可以更好地解决这个问题。本文的主要目的是详细分析OpenNMT中编码器-解码器的数据准备和各种超参数(如LSTM层数、LSTM方向、字符嵌入向量大小、批处理大小、迭代次数、隐藏单元大小),以及各种组合的精度。本文还总结了使用OpenNMT进行梵文OCR的最佳精度模型。该方法在测试集上的文本识别性能达到了99.44%。我们的主要贡献是展示了使用OpenNMT架构对各种字体的退化线图像进行文本识别。我们的贡献有助于研究社区确定梵语OCR的编码器-解码器架构的超参数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信