{"title":"An absolute Optical Character Recognition system for Bangla script Utilizing a captured image","authors":"Md. Ruhulamin Siddique, Md. Ashiq Mahmood","doi":"10.1109/ETCCE54784.2021.9689855","DOIUrl":null,"url":null,"abstract":"Character recognition from a captured image is a significant field of research because there are 230 million native speakers in Bangladesh and India. In addition, there are many signboards, billboards, and many other image sources that contain Bangla Script. Since mid-1980, researchers started to recognize Bangla characters from scanned images. However, they already tried different kinds of methods to identify characters and examine the performance of recognition. This paper focuses on developing an eclectic OCR system that can recognize and extract Bangla text. This recognition process predestines captured images by digital camera or scanner containing Bangla scripts. Preprocessing steps include binarization, segmentation, noise cleaning, scaling characters by font size, skew detection, and correction. Freeman chain code represents a character from the image after feature extraction from a scaled character. A multilayer feedforward neural network-based recognition scheme is constructed to recognize and classify the unknown character and samples. We concluded that the success rate is approximately 99% in identifying characters and demonstrating the Unicode text from experimental results.","PeriodicalId":208038,"journal":{"name":"2021 Emerging Technology in Computing, Communication and Electronics (ETCCE)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Emerging Technology in Computing, Communication and Electronics (ETCCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ETCCE54784.2021.9689855","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Character recognition from a captured image is a significant field of research because there are 230 million native speakers in Bangladesh and India. In addition, there are many signboards, billboards, and many other image sources that contain Bangla Script. Since mid-1980, researchers started to recognize Bangla characters from scanned images. However, they already tried different kinds of methods to identify characters and examine the performance of recognition. This paper focuses on developing an eclectic OCR system that can recognize and extract Bangla text. This recognition process predestines captured images by digital camera or scanner containing Bangla scripts. Preprocessing steps include binarization, segmentation, noise cleaning, scaling characters by font size, skew detection, and correction. Freeman chain code represents a character from the image after feature extraction from a scaled character. A multilayer feedforward neural network-based recognition scheme is constructed to recognize and classify the unknown character and samples. We concluded that the success rate is approximately 99% in identifying characters and demonstrating the Unicode text from experimental results.