{"title":"使用混合机器学习技术通过OCR识别和分类卡纳达语手写字符","authors":"Deekshitha Gowda, V. Kanchana","doi":"10.1109/ICDSIS55133.2022.9915906","DOIUrl":null,"url":null,"abstract":"In many workplaces in Karnataka the documents are in regional language and it is handwritten. Consequently, there is a requirement for a PC based framework to beat the gap among machines and people. There is a lot of challenges faced when converting these handwritten documents to computer editable format. One of the challenges faced is in classifying confounding characters which are many in Kannada which may recognize wrongly due to the way the characters are written. The scanned handwritten document was pre-processed then segmented into line, word and character ouring Edge based segmentation. The feature extracted mostly based on the curviness of the characters using Convolutional Neural Networks. The segmented and feature extracted characters are further classified using Support Vector Machines, K Nearest Neighbors and Random Forest algorithms. The accuracy rates obtained based on 2000 handwritten documents where Random Forest-95%, Support Vector Machine - 96%, K Nearest Neighbors-92%.","PeriodicalId":178360,"journal":{"name":"2022 IEEE International Conference on Data Science and Information System (ICDSIS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Kannada Handwritten Character Recognition and Classification Through OCR Using Hybrid Machine Learning Techniques\",\"authors\":\"Deekshitha Gowda, V. Kanchana\",\"doi\":\"10.1109/ICDSIS55133.2022.9915906\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In many workplaces in Karnataka the documents are in regional language and it is handwritten. Consequently, there is a requirement for a PC based framework to beat the gap among machines and people. There is a lot of challenges faced when converting these handwritten documents to computer editable format. One of the challenges faced is in classifying confounding characters which are many in Kannada which may recognize wrongly due to the way the characters are written. The scanned handwritten document was pre-processed then segmented into line, word and character ouring Edge based segmentation. The feature extracted mostly based on the curviness of the characters using Convolutional Neural Networks. The segmented and feature extracted characters are further classified using Support Vector Machines, K Nearest Neighbors and Random Forest algorithms. The accuracy rates obtained based on 2000 handwritten documents where Random Forest-95%, Support Vector Machine - 96%, K Nearest Neighbors-92%.\",\"PeriodicalId\":178360,\"journal\":{\"name\":\"2022 IEEE International Conference on Data Science and Information System (ICDSIS)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Data Science and Information System (ICDSIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDSIS55133.2022.9915906\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Data Science and Information System (ICDSIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDSIS55133.2022.9915906","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Kannada Handwritten Character Recognition and Classification Through OCR Using Hybrid Machine Learning Techniques
In many workplaces in Karnataka the documents are in regional language and it is handwritten. Consequently, there is a requirement for a PC based framework to beat the gap among machines and people. There is a lot of challenges faced when converting these handwritten documents to computer editable format. One of the challenges faced is in classifying confounding characters which are many in Kannada which may recognize wrongly due to the way the characters are written. The scanned handwritten document was pre-processed then segmented into line, word and character ouring Edge based segmentation. The feature extracted mostly based on the curviness of the characters using Convolutional Neural Networks. The segmented and feature extracted characters are further classified using Support Vector Machines, K Nearest Neighbors and Random Forest algorithms. The accuracy rates obtained based on 2000 handwritten documents where Random Forest-95%, Support Vector Machine - 96%, K Nearest Neighbors-92%.