D. Salunke, Pooja Sabne, Hitesh Saini, Vivekanand Shivanagi, Pradnya Jadhav
{"title":"手写体Devanagari字识别使用自定义卷积神经网络","authors":"D. Salunke, Pooja Sabne, Hitesh Saini, Vivekanand Shivanagi, Pradnya Jadhav","doi":"10.1109/CCGE50943.2021.9776351","DOIUrl":null,"url":null,"abstract":"Devanagari language comprises 47 primary characters that include 14 vowels and 33 consonants. Nearly 120 scripts have been developed from it like Marathi, Hindi, Bengali, etc. This paper deals with a Marathi handwritten word recognition system. Marathi is an Indian language that is spoken primarily in the state of Maharashtra, most of the government paperwork is done in this language only. This system will mainly be used in recognizing various handwritten words present in the titles of different municipal documents. The main concern regarding these documents was the task of storing them digitally. The existing works have used the method of segmentation and have proposed a hidden Markov model to recognize pseudo characters, but the accuracy can be increased. We have devised a system that will overcome this limitation. In our system, a dataset is created for 2000 samples and by applying different augmentation techniques the dataset size is further increased to 12000 images. The proposed system is implemented using deep learning based Customized Convolution Neural Network which is pre-trained on a manually collected dataset containing various Marathi handwritten words. Out of which 75% of the dataset is used for training and 25% for testing purposes. This system has proved better results on augmented data with 94% accuracy.","PeriodicalId":130452,"journal":{"name":"2021 International Conference on Computing, Communication and Green Engineering (CCGE)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Handwritten Devanagari Word Recognition using Customized Convolution Neural Network\",\"authors\":\"D. Salunke, Pooja Sabne, Hitesh Saini, Vivekanand Shivanagi, Pradnya Jadhav\",\"doi\":\"10.1109/CCGE50943.2021.9776351\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Devanagari language comprises 47 primary characters that include 14 vowels and 33 consonants. Nearly 120 scripts have been developed from it like Marathi, Hindi, Bengali, etc. This paper deals with a Marathi handwritten word recognition system. Marathi is an Indian language that is spoken primarily in the state of Maharashtra, most of the government paperwork is done in this language only. This system will mainly be used in recognizing various handwritten words present in the titles of different municipal documents. The main concern regarding these documents was the task of storing them digitally. The existing works have used the method of segmentation and have proposed a hidden Markov model to recognize pseudo characters, but the accuracy can be increased. We have devised a system that will overcome this limitation. In our system, a dataset is created for 2000 samples and by applying different augmentation techniques the dataset size is further increased to 12000 images. The proposed system is implemented using deep learning based Customized Convolution Neural Network which is pre-trained on a manually collected dataset containing various Marathi handwritten words. Out of which 75% of the dataset is used for training and 25% for testing purposes. This system has proved better results on augmented data with 94% accuracy.\",\"PeriodicalId\":130452,\"journal\":{\"name\":\"2021 International Conference on Computing, Communication and Green Engineering (CCGE)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Computing, Communication and Green Engineering (CCGE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGE50943.2021.9776351\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computing, Communication and Green Engineering (CCGE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGE50943.2021.9776351","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Handwritten Devanagari Word Recognition using Customized Convolution Neural Network
Devanagari language comprises 47 primary characters that include 14 vowels and 33 consonants. Nearly 120 scripts have been developed from it like Marathi, Hindi, Bengali, etc. This paper deals with a Marathi handwritten word recognition system. Marathi is an Indian language that is spoken primarily in the state of Maharashtra, most of the government paperwork is done in this language only. This system will mainly be used in recognizing various handwritten words present in the titles of different municipal documents. The main concern regarding these documents was the task of storing them digitally. The existing works have used the method of segmentation and have proposed a hidden Markov model to recognize pseudo characters, but the accuracy can be increased. We have devised a system that will overcome this limitation. In our system, a dataset is created for 2000 samples and by applying different augmentation techniques the dataset size is further increased to 12000 images. The proposed system is implemented using deep learning based Customized Convolution Neural Network which is pre-trained on a manually collected dataset containing various Marathi handwritten words. Out of which 75% of the dataset is used for training and 25% for testing purposes. This system has proved better results on augmented data with 94% accuracy.