{"title":"Ancient Sanskrit Line-level OCR using OpenNMT Architecture","authors":"Ronak Shah, M. Gupta, Ajai Kumar","doi":"10.1109/ICIIP53038.2021.9702666","DOIUrl":null,"url":null,"abstract":"There have been several Optical Character Recognition (OCR) related works happened for Indian languages like Hindi, Marathi, Bangla, etc. But there is very little OCR-related work done for the Sanskrit language of Devanagari script. Sanskrit is a very complex language. The large word length and old degraded documents add more challenges to Sanskrit OCR research. Due to these challenges, the word accuracy of available OCR systems is not very high for such documents. Most of the work happened to recognize Sanskrit character recognition only. There is only one attempt to recognize the whole Sanskrit line for 10 fonts.This paper shows the study of different hyperparameters of OpenNMT architecture for Sanskrit OCR of synthetically generated color line images. A neural encoder-decoder model with attention is presented to converting line images into editable text. An attention-based approach can tackle this problem in a better way in comparison to other neural techniques using CTCbased models. The main aim of this paper is to give a detailed analysis of data preparation and various hyperparameters (like the number of LSTM layers, LSTM direction, size of character embedding vector, batch size, number of iteration, and hidden unit size) of encoder-decoder in OpenNMT, and accuracy of various combinations. This paper also concludes the best accuracy model for Sanskrit OCR using OpenNMT. The text recognition performance of the proposed method on the test set is achieved 99.44%. Our major contribution is to show text recognization of degraded line images with a variety of fonts using OpenNMT architecture. Our contribution helps the researcher community in deciding hyperparameters of encoder-decoder architecture for Sanskrit language OCR.","PeriodicalId":431272,"journal":{"name":"2021 Sixth International Conference on Image Information Processing (ICIIP)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Sixth International Conference on Image Information Processing (ICIIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIIP53038.2021.9702666","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
There have been several Optical Character Recognition (OCR) related works happened for Indian languages like Hindi, Marathi, Bangla, etc. But there is very little OCR-related work done for the Sanskrit language of Devanagari script. Sanskrit is a very complex language. The large word length and old degraded documents add more challenges to Sanskrit OCR research. Due to these challenges, the word accuracy of available OCR systems is not very high for such documents. Most of the work happened to recognize Sanskrit character recognition only. There is only one attempt to recognize the whole Sanskrit line for 10 fonts.This paper shows the study of different hyperparameters of OpenNMT architecture for Sanskrit OCR of synthetically generated color line images. A neural encoder-decoder model with attention is presented to converting line images into editable text. An attention-based approach can tackle this problem in a better way in comparison to other neural techniques using CTCbased models. The main aim of this paper is to give a detailed analysis of data preparation and various hyperparameters (like the number of LSTM layers, LSTM direction, size of character embedding vector, batch size, number of iteration, and hidden unit size) of encoder-decoder in OpenNMT, and accuracy of various combinations. This paper also concludes the best accuracy model for Sanskrit OCR using OpenNMT. The text recognition performance of the proposed method on the test set is achieved 99.44%. Our major contribution is to show text recognization of degraded line images with a variety of fonts using OpenNMT architecture. Our contribution helps the researcher community in deciding hyperparameters of encoder-decoder architecture for Sanskrit language OCR.