Recognition of Vowels, Consonants and Compound Character Sequences from Ancient Tamil Stone Inscriptions using Deep Neural Networks

P. Uma Maheswari, A. Aswathy, S. Ezhilarasi, M. Revathi Priya
{"title":"Recognition of Vowels, Consonants and Compound Character Sequences from Ancient Tamil Stone Inscriptions using Deep Neural Networks","authors":"P. Uma Maheswari, A. Aswathy, S. Ezhilarasi, M. Revathi Priya","doi":"10.1109/IPRECON55716.2022.10059489","DOIUrl":null,"url":null,"abstract":"Tamil Stone Inscriptions are ancient handwritten documents engraved on stone that contain a veritable mine of information and traditional knowledge. With reference to the authenticated sources, around 65% Indian inscriptions were found in Tamil language. Amidst many inscriptions character recognition studies that have been published for different languages, no significant effort has been taken for Tamil language. There are few high performances online Handwritten Tamil OCRs for modern Tamil alphabets, but they perform with less than 20% accuracy while dealing stone inscription characters since the alphabets are of ancient form and also the difference between background and foreground is very meager in stone inscriptions. Most of the existing character recognition studies for Handwritten Tamil scripts have relied upon the widely used Hidden Markov Model (HMM), in spite of its familiar shortcomings and few reported in ANN. Though character recognition from the images of inscribed documents is challenging because of the complex character structure of the Tamil language scripts and other artifacts like aging and degradation, modern techniques needs to be developed to digitize such inscriptions and preserves them as electronic documents. In this paper, a novel approach is proposed for recognizing the inscription characters from the ancient Tamil stone inscriptions based on two recently developed models of Convolutional Neural Networks and Recurrent Neural Network (RNN). Camera captured stone inscriptions script images are taken as input and enhanced for clarity through various image enhancement techniques like filtering, luminous, erosion, dilation and blurring. Project profile based character segmentation is done for extracting individual characters out of script image. Over segmentation reduction is done for eliminating touching and broken characters. The character recognition is done in twofold. (i) Recognition of single characters (vowels and consonants) using Convolutional Neural Network (ii) Recognition of compound characters using Recurrent Neural Networks-BLSTM model. A meticulous test on large datasets has been performed to evaluate the performance of the proposed approach. Experimental results show that the proposed CNN based system achieved training accuracy of 88% and validation accuracy of 94% is obtained (Included RNN accuracy). The system performance is evaluated on various test cases in each phase and the limitations have been identified.","PeriodicalId":407222,"journal":{"name":"2022 IEEE International Power and Renewable Energy Conference (IPRECON)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Power and Renewable Energy Conference (IPRECON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPRECON55716.2022.10059489","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Tamil Stone Inscriptions are ancient handwritten documents engraved on stone that contain a veritable mine of information and traditional knowledge. With reference to the authenticated sources, around 65% Indian inscriptions were found in Tamil language. Amidst many inscriptions character recognition studies that have been published for different languages, no significant effort has been taken for Tamil language. There are few high performances online Handwritten Tamil OCRs for modern Tamil alphabets, but they perform with less than 20% accuracy while dealing stone inscription characters since the alphabets are of ancient form and also the difference between background and foreground is very meager in stone inscriptions. Most of the existing character recognition studies for Handwritten Tamil scripts have relied upon the widely used Hidden Markov Model (HMM), in spite of its familiar shortcomings and few reported in ANN. Though character recognition from the images of inscribed documents is challenging because of the complex character structure of the Tamil language scripts and other artifacts like aging and degradation, modern techniques needs to be developed to digitize such inscriptions and preserves them as electronic documents. In this paper, a novel approach is proposed for recognizing the inscription characters from the ancient Tamil stone inscriptions based on two recently developed models of Convolutional Neural Networks and Recurrent Neural Network (RNN). Camera captured stone inscriptions script images are taken as input and enhanced for clarity through various image enhancement techniques like filtering, luminous, erosion, dilation and blurring. Project profile based character segmentation is done for extracting individual characters out of script image. Over segmentation reduction is done for eliminating touching and broken characters. The character recognition is done in twofold. (i) Recognition of single characters (vowels and consonants) using Convolutional Neural Network (ii) Recognition of compound characters using Recurrent Neural Networks-BLSTM model. A meticulous test on large datasets has been performed to evaluate the performance of the proposed approach. Experimental results show that the proposed CNN based system achieved training accuracy of 88% and validation accuracy of 94% is obtained (Included RNN accuracy). The system performance is evaluated on various test cases in each phase and the limitations have been identified.
用深度神经网络识别古泰米尔石碑上的元音、辅音和复合字符序列
泰米尔石刻是刻在石头上的古代手写文件,其中包含了名副其实的信息和传统知识。根据经过认证的来源,大约65%的印度铭文是泰米尔语的。在许多针对不同语言的铭文字符识别研究中,没有对泰米尔语进行重大的努力。对于现代泰米尔字母,很少有高性能的在线手写泰米尔ocr,但在处理石刻字符时,它们的准确率低于20%,因为字母是古老的形式,而且石刻中背景和前景之间的差异非常小。尽管隐马尔可夫模型(HMM)存在常见的缺点,而且在人工神经网络中也很少有报道,但现有的泰米尔手写体字符识别研究大多依赖于广泛使用的隐马尔可夫模型。尽管由于泰米尔语文字的复杂字符结构和其他文物(如老化和退化),从铭文图像中识别字符具有挑战性,但需要开发现代技术将这些铭文数字化,并将其保存为电子文件。本文提出了一种基于卷积神经网络和递归神经网络(RNN)两种新模型的古泰米尔石刻文字识别方法。相机捕捉的石刻文字图像作为输入,并通过各种图像增强技术,如滤波,发光,侵蚀,扩张和模糊增强清晰度。基于项目轮廓的字符分割是为了从脚本图像中提取单个字符。过度分割减少是为了消除触摸和破碎的字符。字符识别分为两部分。(i)使用卷积神经网络识别单个字符(元音和辅音)(ii)使用递归神经网络- blstm模型识别复合字符。在大型数据集上进行了细致的测试,以评估所提出方法的性能。实验结果表明,本文提出的基于CNN的系统训练准确率达到88%,验证准确率达到94%(含RNN准确率)。在每个阶段的不同测试用例上对系统性能进行了评估,并确定了局限性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信