Bangla Optical Character Recognition (OCR) Using Deep Learning Based Image Classification Algorithms

2021 24th International Conference on Computer and Information Technology (ICCIT) Pub Date : 2021-12-18 DOI:10.1109/ICCIT54785.2021.9689864

Nadim Mahmud Dipu, Sifatul Alam Shohan, K. Salam

{"title":"Bangla Optical Character Recognition (OCR) Using Deep Learning Based Image Classification Algorithms","authors":"Nadim Mahmud Dipu, Sifatul Alam Shohan, K. Salam","doi":"10.1109/ICCIT54785.2021.9689864","DOIUrl":null,"url":null,"abstract":"Optical Character Recognition (OCR) refers to the process of converting images of printed, typed, or handwritten text into machine-readable text. OCR is one of the most widely researched topics in the field of computer vision. Furthermore, highly accurate, and sophisticated Optical Character Recognition systems have been built for most of the major languages of the world such as English, French, German, Mandarin, etc. However, despite having 300 million native speakers (4.00% of the world population) and being the 5th most spoken language of the world, the Bengali language still does not have a state-of-the-art OCR system. Moreover, most of the existing systems are not able to recognize compound letters. This study strives to resolve this issue by proposing three neural network based image classification models for Bangla OCR. These models are Inception V3, VGG16, and Vision Transformer. These models have been trained on the BanglaLekha-Isolated dataset that contains 98,950 images of Bengali characters (vowels, consonants, digits, compound letters). The accuracy provided by the VGG-16, Inception V3, and Vision Transformer on the test set are 98.65%, 97.82%, and 96.88% respectively. Each of these models is much more accurate than the existing systems. Real-time implementation of these three models will be instrumental in building a state-of-the-art Bangla OCR system.","PeriodicalId":166450,"journal":{"name":"2021 24th International Conference on Computer and Information Technology (ICCIT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 24th International Conference on Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIT54785.2021.9689864","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Optical Character Recognition (OCR) refers to the process of converting images of printed, typed, or handwritten text into machine-readable text. OCR is one of the most widely researched topics in the field of computer vision. Furthermore, highly accurate, and sophisticated Optical Character Recognition systems have been built for most of the major languages of the world such as English, French, German, Mandarin, etc. However, despite having 300 million native speakers (4.00% of the world population) and being the 5th most spoken language of the world, the Bengali language still does not have a state-of-the-art OCR system. Moreover, most of the existing systems are not able to recognize compound letters. This study strives to resolve this issue by proposing three neural network based image classification models for Bangla OCR. These models are Inception V3, VGG16, and Vision Transformer. These models have been trained on the BanglaLekha-Isolated dataset that contains 98,950 images of Bengali characters (vowels, consonants, digits, compound letters). The accuracy provided by the VGG-16, Inception V3, and Vision Transformer on the test set are 98.65%, 97.82%, and 96.88% respectively. Each of these models is much more accurate than the existing systems. Real-time implementation of these three models will be instrumental in building a state-of-the-art Bangla OCR system.

查看原文本刊更多论文

使用基于深度学习的图像分类算法的孟加拉光学字符识别(OCR)

光学字符识别(OCR)是指将打印、打字或手写文本的图像转换为机器可读文本的过程。OCR是计算机视觉领域中研究最广泛的课题之一。此外，高精度和复杂的光学字符识别系统已经建立了世界上大多数主要语言，如英语，法语，德语，普通话等。然而，尽管有3亿人以孟加拉语为母语(占世界人口的4.00%)，并且是世界上第五大语言，但孟加拉语仍然没有最先进的OCR系统。此外，大多数现有的系统都不能识别复合字母。本研究试图通过提出三种基于神经网络的孟加拉语OCR图像分类模型来解决这一问题。这些模型是Inception V3、VGG16和Vision Transformer。这些模型在BanglaLekha-Isolated数据集上进行了训练，该数据集包含98,950个孟加拉字符(元音、辅音、数字、复合字母)的图像。VGG-16、盗梦空间V3和Vision Transformer在测试集上提供的准确率分别为98.65%、97.82%和96.88%。这些模型中的每一个都比现有的系统精确得多。这三种模式的实时实施将有助于建立最先进的孟加拉国OCR系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 24th International Conference on Computer and Information Technology (ICCIT)

自引率

0.00%

发文量