{"title":"Bangla Optical Character Recognition (OCR) Using Deep Learning Based Image Classification Algorithms","authors":"Nadim Mahmud Dipu, Sifatul Alam Shohan, K. Salam","doi":"10.1109/ICCIT54785.2021.9689864","DOIUrl":null,"url":null,"abstract":"Optical Character Recognition (OCR) refers to the process of converting images of printed, typed, or handwritten text into machine-readable text. OCR is one of the most widely researched topics in the field of computer vision. Furthermore, highly accurate, and sophisticated Optical Character Recognition systems have been built for most of the major languages of the world such as English, French, German, Mandarin, etc. However, despite having 300 million native speakers (4.00% of the world population) and being the 5th most spoken language of the world, the Bengali language still does not have a state-of-the-art OCR system. Moreover, most of the existing systems are not able to recognize compound letters. This study strives to resolve this issue by proposing three neural network based image classification models for Bangla OCR. These models are Inception V3, VGG16, and Vision Transformer. These models have been trained on the BanglaLekha-Isolated dataset that contains 98,950 images of Bengali characters (vowels, consonants, digits, compound letters). The accuracy provided by the VGG-16, Inception V3, and Vision Transformer on the test set are 98.65%, 97.82%, and 96.88% respectively. Each of these models is much more accurate than the existing systems. Real-time implementation of these three models will be instrumental in building a state-of-the-art Bangla OCR system.","PeriodicalId":166450,"journal":{"name":"2021 24th International Conference on Computer and Information Technology (ICCIT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 24th International Conference on Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIT54785.2021.9689864","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Optical Character Recognition (OCR) refers to the process of converting images of printed, typed, or handwritten text into machine-readable text. OCR is one of the most widely researched topics in the field of computer vision. Furthermore, highly accurate, and sophisticated Optical Character Recognition systems have been built for most of the major languages of the world such as English, French, German, Mandarin, etc. However, despite having 300 million native speakers (4.00% of the world population) and being the 5th most spoken language of the world, the Bengali language still does not have a state-of-the-art OCR system. Moreover, most of the existing systems are not able to recognize compound letters. This study strives to resolve this issue by proposing three neural network based image classification models for Bangla OCR. These models are Inception V3, VGG16, and Vision Transformer. These models have been trained on the BanglaLekha-Isolated dataset that contains 98,950 images of Bengali characters (vowels, consonants, digits, compound letters). The accuracy provided by the VGG-16, Inception V3, and Vision Transformer on the test set are 98.65%, 97.82%, and 96.88% respectively. Each of these models is much more accurate than the existing systems. Real-time implementation of these three models will be instrumental in building a state-of-the-art Bangla OCR system.