Ancient Sanskrit Line-level OCR using OpenNMT Architecture

2021 Sixth International Conference on Image Information Processing (ICIIP) Pub Date : 2021-11-26 DOI:10.1109/ICIIP53038.2021.9702666

Ronak Shah, M. Gupta, Ajai Kumar

{"title":"Ancient Sanskrit Line-level OCR using OpenNMT Architecture","authors":"Ronak Shah, M. Gupta, Ajai Kumar","doi":"10.1109/ICIIP53038.2021.9702666","DOIUrl":null,"url":null,"abstract":"There have been several Optical Character Recognition (OCR) related works happened for Indian languages like Hindi, Marathi, Bangla, etc. But there is very little OCR-related work done for the Sanskrit language of Devanagari script. Sanskrit is a very complex language. The large word length and old degraded documents add more challenges to Sanskrit OCR research. Due to these challenges, the word accuracy of available OCR systems is not very high for such documents. Most of the work happened to recognize Sanskrit character recognition only. There is only one attempt to recognize the whole Sanskrit line for 10 fonts.This paper shows the study of different hyperparameters of OpenNMT architecture for Sanskrit OCR of synthetically generated color line images. A neural encoder-decoder model with attention is presented to converting line images into editable text. An attention-based approach can tackle this problem in a better way in comparison to other neural techniques using CTCbased models. The main aim of this paper is to give a detailed analysis of data preparation and various hyperparameters (like the number of LSTM layers, LSTM direction, size of character embedding vector, batch size, number of iteration, and hidden unit size) of encoder-decoder in OpenNMT, and accuracy of various combinations. This paper also concludes the best accuracy model for Sanskrit OCR using OpenNMT. The text recognition performance of the proposed method on the test set is achieved 99.44%. Our major contribution is to show text recognization of degraded line images with a variety of fonts using OpenNMT architecture. Our contribution helps the researcher community in deciding hyperparameters of encoder-decoder architecture for Sanskrit language OCR.","PeriodicalId":431272,"journal":{"name":"2021 Sixth International Conference on Image Information Processing (ICIIP)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Sixth International Conference on Image Information Processing (ICIIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIIP53038.2021.9702666","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

There have been several Optical Character Recognition (OCR) related works happened for Indian languages like Hindi, Marathi, Bangla, etc. But there is very little OCR-related work done for the Sanskrit language of Devanagari script. Sanskrit is a very complex language. The large word length and old degraded documents add more challenges to Sanskrit OCR research. Due to these challenges, the word accuracy of available OCR systems is not very high for such documents. Most of the work happened to recognize Sanskrit character recognition only. There is only one attempt to recognize the whole Sanskrit line for 10 fonts.This paper shows the study of different hyperparameters of OpenNMT architecture for Sanskrit OCR of synthetically generated color line images. A neural encoder-decoder model with attention is presented to converting line images into editable text. An attention-based approach can tackle this problem in a better way in comparison to other neural techniques using CTCbased models. The main aim of this paper is to give a detailed analysis of data preparation and various hyperparameters (like the number of LSTM layers, LSTM direction, size of character embedding vector, batch size, number of iteration, and hidden unit size) of encoder-decoder in OpenNMT, and accuracy of various combinations. This paper also concludes the best accuracy model for Sanskrit OCR using OpenNMT. The text recognition performance of the proposed method on the test set is achieved 99.44%. Our major contribution is to show text recognization of degraded line images with a variety of fonts using OpenNMT architecture. Our contribution helps the researcher community in deciding hyperparameters of encoder-decoder architecture for Sanskrit language OCR.

查看原文本刊更多论文

使用OpenNMT架构的古梵文行级OCR

有几个光学字符识别(OCR)相关的工作发生在印度语言，如印地语，马拉地语，孟加拉语等。但是对于梵语的Devanagari文字，很少有与ocr相关的工作。梵语是一种非常复杂的语言。较大的字长和旧的退化文档给梵文OCR研究增加了更多的挑战。由于这些挑战，可用的OCR系统对这类文档的单词精度不是很高。大部分工作碰巧只识别梵文字符识别。只有一次尝试来识别10种字体的整个梵文行。本文研究了OpenNMT体系结构的不同超参数对合成彩色线图像的梵文OCR的影响。提出了一种带注意力的神经编码器-解码器模型，用于将行图像转换为可编辑的文本。与使用基于ctc的模型的其他神经技术相比，基于注意力的方法可以更好地解决这个问题。本文的主要目的是详细分析OpenNMT中编码器-解码器的数据准备和各种超参数(如LSTM层数、LSTM方向、字符嵌入向量大小、批处理大小、迭代次数、隐藏单元大小)，以及各种组合的精度。本文还总结了使用OpenNMT进行梵文OCR的最佳精度模型。该方法在测试集上的文本识别性能达到了99.44%。我们的主要贡献是展示了使用OpenNMT架构对各种字体的退化线图像进行文本识别。我们的贡献有助于研究社区确定梵语OCR的编码器-解码器架构的超参数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 Sixth International Conference on Image Information Processing (ICIIP)

自引率

0.00%

发文量