{"title":"Post-processing methodology for word level Telugu character recognition systems using Unicode Approximation Models","authors":"N. Rani, T. Vasudev","doi":"10.1109/ITACT.2015.7492681","DOIUrl":null,"url":null,"abstract":"Digitization and automatic interpretation of document images into editable document format is the primary inclination of optical character recognition systems (OCR). This paper proposes a novel technique for resolution of post processing errors that occurs with respect to Telugu OCR using word level Unicode Approximation Models (UAM) through a mapper module. The mapper module performs the word level one-one mapping of assigning a sequence of recognized class labels to appropriate UAM. The sequence of recognized class labels are related to one particular word and are generated from the classifier as output. The proposed algorithm effectively resolves the problem of segmentation errors, preprocessing errors like cuts and merges in characters, noise, occlusions, semantic ordering and confusing character classes. The proposed UAM models provide adequate and consistent accuracies of around 96.2% for printed words and 91.7% towards handwritten words respectively.","PeriodicalId":336783,"journal":{"name":"2015 International Conference on Trends in Automation, Communications and Computing Technology (I-TACT-15)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Trends in Automation, Communications and Computing Technology (I-TACT-15)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITACT.2015.7492681","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Digitization and automatic interpretation of document images into editable document format is the primary inclination of optical character recognition systems (OCR). This paper proposes a novel technique for resolution of post processing errors that occurs with respect to Telugu OCR using word level Unicode Approximation Models (UAM) through a mapper module. The mapper module performs the word level one-one mapping of assigning a sequence of recognized class labels to appropriate UAM. The sequence of recognized class labels are related to one particular word and are generated from the classifier as output. The proposed algorithm effectively resolves the problem of segmentation errors, preprocessing errors like cuts and merges in characters, noise, occlusions, semantic ordering and confusing character classes. The proposed UAM models provide adequate and consistent accuracies of around 96.2% for printed words and 91.7% towards handwritten words respectively.