{"title":"An OCR for separation and identification of mixed English — Gujarati digits using kNN classifier","authors":"S. Chaudhari, R. Gulati","doi":"10.1109/ISSP.2013.6526900","DOIUrl":null,"url":null,"abstract":"This paper addresses the script identification problem of bilingual printed document images. We propose an OCR system that separates and identify mixed English-Gujarati digits. Here, first the system is trained with standard data samples. Then for testing, data samples are collected from different sources of paper like, news paper, book, magazine, etc. Random sized pre-processed image is normalized to uniform sized image. A statistical approach is used for feature extraction. For classification kNN classifier is used. The model gives average accuracy of 99.26% for Gujarati digits, 99.20% for English digits, and overall accuracy 99.23%.","PeriodicalId":354719,"journal":{"name":"2013 International Conference on Intelligent Systems and Signal Processing (ISSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Intelligent Systems and Signal Processing (ISSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSP.2013.6526900","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
This paper addresses the script identification problem of bilingual printed document images. We propose an OCR system that separates and identify mixed English-Gujarati digits. Here, first the system is trained with standard data samples. Then for testing, data samples are collected from different sources of paper like, news paper, book, magazine, etc. Random sized pre-processed image is normalized to uniform sized image. A statistical approach is used for feature extraction. For classification kNN classifier is used. The model gives average accuracy of 99.26% for Gujarati digits, 99.20% for English digits, and overall accuracy 99.23%.