{"title":"Computationally efficient handwritten Telugu text recognition","authors":"Buddaraju Revathi, M. V. D. Prasad, N. K. Gattim","doi":"10.11591/ijeecs.v34.i3.pp1618-1626","DOIUrl":null,"url":null,"abstract":"Optical character recognition (OCR) for regional languages is difficult due to their complex orthographic structure, lack of dataset resources, a greater number of characters and similarity in structure between characters. Telugu is popular language in states of Andhra and Telangana. Telugu exhibits distinct separation between characters within a word, making a character-level dataset sufficient. With a smaller dataset, we can effectively recognize more words. However, challenges arise during the training of compound characters, which are combinations of vowels and consonants. These are considered as two or more characters based on associated vattus and dheerghams with the base character. To address this challenge, each compound character is encoded into a numerical value and used as input during training, with subsequent retrieval during recognition. The segmentation issue arises from overlapping characters caused by varying handwritten styles. For handling segmentation issues at the character level arising from handwritten styles, we have proposed an algorithm based on the language's features. To enhance word-level accuracy a dictionary-based model was devised. A neural network utilizing the inception module is employed for feature extraction at various scales, achieving word-level accuracy rates of 78% with fewer trainable parameters.","PeriodicalId":13480,"journal":{"name":"Indonesian Journal of Electrical Engineering and Computer Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Indonesian Journal of Electrical Engineering and Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11591/ijeecs.v34.i3.pp1618-1626","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 0
Abstract
Optical character recognition (OCR) for regional languages is difficult due to their complex orthographic structure, lack of dataset resources, a greater number of characters and similarity in structure between characters. Telugu is popular language in states of Andhra and Telangana. Telugu exhibits distinct separation between characters within a word, making a character-level dataset sufficient. With a smaller dataset, we can effectively recognize more words. However, challenges arise during the training of compound characters, which are combinations of vowels and consonants. These are considered as two or more characters based on associated vattus and dheerghams with the base character. To address this challenge, each compound character is encoded into a numerical value and used as input during training, with subsequent retrieval during recognition. The segmentation issue arises from overlapping characters caused by varying handwritten styles. For handling segmentation issues at the character level arising from handwritten styles, we have proposed an algorithm based on the language's features. To enhance word-level accuracy a dictionary-based model was devised. A neural network utilizing the inception module is employed for feature extraction at various scales, achieving word-level accuracy rates of 78% with fewer trainable parameters.
期刊介绍:
The aim of Indonesian Journal of Electrical Engineering and Computer Science (formerly TELKOMNIKA Indonesian Journal of Electrical Engineering) is to publish high-quality articles dedicated to all aspects of the latest outstanding developments in the field of electrical engineering. Its scope encompasses the applications of Telecommunication and Information Technology, Applied Computing and Computer, Instrumentation and Control, Electrical (Power), Electronics Engineering and Informatics which covers, but not limited to, the following scope: Signal Processing[...] Electronics[...] Electrical[...] Telecommunication[...] Instrumentation & Control[...] Computing and Informatics[...]