手写体Devanagari字识别使用自定义卷积神经网络

2021 International Conference on Computing, Communication and Green Engineering (CCGE) Pub Date : 2021-09-23 DOI:10.1109/CCGE50943.2021.9776351

D. Salunke, Pooja Sabne, Hitesh Saini, Vivekanand Shivanagi, Pradnya Jadhav

{"title":"手写体Devanagari字识别使用自定义卷积神经网络","authors":"D. Salunke, Pooja Sabne, Hitesh Saini, Vivekanand Shivanagi, Pradnya Jadhav","doi":"10.1109/CCGE50943.2021.9776351","DOIUrl":null,"url":null,"abstract":"Devanagari language comprises 47 primary characters that include 14 vowels and 33 consonants. Nearly 120 scripts have been developed from it like Marathi, Hindi, Bengali, etc. This paper deals with a Marathi handwritten word recognition system. Marathi is an Indian language that is spoken primarily in the state of Maharashtra, most of the government paperwork is done in this language only. This system will mainly be used in recognizing various handwritten words present in the titles of different municipal documents. The main concern regarding these documents was the task of storing them digitally. The existing works have used the method of segmentation and have proposed a hidden Markov model to recognize pseudo characters, but the accuracy can be increased. We have devised a system that will overcome this limitation. In our system, a dataset is created for 2000 samples and by applying different augmentation techniques the dataset size is further increased to 12000 images. The proposed system is implemented using deep learning based Customized Convolution Neural Network which is pre-trained on a manually collected dataset containing various Marathi handwritten words. Out of which 75% of the dataset is used for training and 25% for testing purposes. This system has proved better results on augmented data with 94% accuracy.","PeriodicalId":130452,"journal":{"name":"2021 International Conference on Computing, Communication and Green Engineering (CCGE)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Handwritten Devanagari Word Recognition using Customized Convolution Neural Network\",\"authors\":\"D. Salunke, Pooja Sabne, Hitesh Saini, Vivekanand Shivanagi, Pradnya Jadhav\",\"doi\":\"10.1109/CCGE50943.2021.9776351\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Devanagari language comprises 47 primary characters that include 14 vowels and 33 consonants. Nearly 120 scripts have been developed from it like Marathi, Hindi, Bengali, etc. This paper deals with a Marathi handwritten word recognition system. Marathi is an Indian language that is spoken primarily in the state of Maharashtra, most of the government paperwork is done in this language only. This system will mainly be used in recognizing various handwritten words present in the titles of different municipal documents. The main concern regarding these documents was the task of storing them digitally. The existing works have used the method of segmentation and have proposed a hidden Markov model to recognize pseudo characters, but the accuracy can be increased. We have devised a system that will overcome this limitation. In our system, a dataset is created for 2000 samples and by applying different augmentation techniques the dataset size is further increased to 12000 images. The proposed system is implemented using deep learning based Customized Convolution Neural Network which is pre-trained on a manually collected dataset containing various Marathi handwritten words. Out of which 75% of the dataset is used for training and 25% for testing purposes. This system has proved better results on augmented data with 94% accuracy.\",\"PeriodicalId\":130452,\"journal\":{\"name\":\"2021 International Conference on Computing, Communication and Green Engineering (CCGE)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Computing, Communication and Green Engineering (CCGE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGE50943.2021.9776351\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computing, Communication and Green Engineering (CCGE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGE50943.2021.9776351","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

梵语由47个主要字符组成，其中包括14个元音和33个辅音。从它发展出了近120个脚本，如马拉地语、印地语、孟加拉语等。本文研究了一个马拉地语手写单词识别系统。马拉地语是一种印度语言，主要在马哈拉施特拉邦使用，大多数政府文书工作都是用这种语言完成的。该系统将主要用于识别不同市政文件标题中出现的各种手写文字。对这些文档的主要关注是数字化存储它们的任务。现有的工作都是使用分割的方法，并提出了一种隐马尔可夫模型来识别伪字符，但精度还有待提高。我们已经设计了一个系统来克服这个限制。在我们的系统中，为2000个样本创建了一个数据集，通过应用不同的增强技术，数据集的大小进一步增加到12000个图像。该系统使用基于深度学习的自定义卷积神经网络实现，该神经网络在手动收集的包含各种马拉地语手写单词的数据集上进行预训练。其中75%的数据集用于训练，25%用于测试。该系统在增强数据上取得了较好的效果，准确率达到94%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Handwritten Devanagari Word Recognition using Customized Convolution Neural Network

Devanagari language comprises 47 primary characters that include 14 vowels and 33 consonants. Nearly 120 scripts have been developed from it like Marathi, Hindi, Bengali, etc. This paper deals with a Marathi handwritten word recognition system. Marathi is an Indian language that is spoken primarily in the state of Maharashtra, most of the government paperwork is done in this language only. This system will mainly be used in recognizing various handwritten words present in the titles of different municipal documents. The main concern regarding these documents was the task of storing them digitally. The existing works have used the method of segmentation and have proposed a hidden Markov model to recognize pseudo characters, but the accuracy can be increased. We have devised a system that will overcome this limitation. In our system, a dataset is created for 2000 samples and by applying different augmentation techniques the dataset size is further increased to 12000 images. The proposed system is implemented using deep learning based Customized Convolution Neural Network which is pre-trained on a manually collected dataset containing various Marathi handwritten words. Out of which 75% of the dataset is used for training and 25% for testing purposes. This system has proved better results on augmented data with 94% accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 International Conference on Computing, Communication and Green Engineering (CCGE)

自引率

0.00%

发文量