基于深度学习方法的Freeman链码古数字识别

Aditi M Joshi, Sanjay G Patel
{"title":"基于深度学习方法的Freeman链码古数字识别","authors":"Aditi M Joshi, Sanjay G Patel","doi":"10.5121/cseij.2022.12102","DOIUrl":null,"url":null,"abstract":"Sanskrit character and number documents have a lot of errors. Correcting those errors using conventional spell-checking approaches breaks down due to the limited vocabulary. This is because of high inflexions of Sanskrit, where words are dynamically formed by Sandhi rules, Samasa rules, Taddhita affixes, etc. Therefore, correcting OCR documents require huge efforts. Here, we can present different machine learning approaches and various ways to improve features for ameliorating the error corrections in Sanskrit documents. Simulation of Sanskrit dictionary for synthesizing off-the-shelf dictionary can be done. Most of the proposed methods can also work for general Sanskrit word corrections and Hindi word corrections. Handwriting recognition in Indic scripts, like Devanagari, is very challenging due to the subtitles in the scripts, variations in rendering and the cursive nature of the handwriting. Lack of public handwriting datasets in Indic scripts has long stymied the development of offline handwritten word recognizers and made comparison across different methods a tedious task in the field. In this paper, a new handwritten word dataset will be released for Devanagari, IIIT-HW-Dev to alleviate some of these issues. This process is required for successful training of deep learning architecture, availability of huge amounts of training data is crucial, as any typical architecture contains millions of parameters. A new method for the classification of freeman chain code using four-connectivity and eight-connectivity events with deep learning approach is presented. Application of CNN LeNet-5 is found to be suitable to get results in this cases as the numbers are formed with curved lines In contrast with the existing FCC event data analysis techniques, sampled grey images of the existing events are not used, but image files of the three-phase PQ event data are analysed by taking the advantage of the success of the deep learning approach on imagefile-classification. Therefore, the novelty of the proposed approach is that image files of the voltage waveforms of the three phases of the power grid are classified. It is shown that the test data can be classified with 100% accuracy. The proposed work is believed to serve the needs of the future smart grid applications, which are fast and taking automatic countermeasures against potential PQ events.","PeriodicalId":361871,"journal":{"name":"Computer Science & Engineering: An International Journal","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Ancient Number Recognition using Freeman Chain Code with Deep Learning Approach\",\"authors\":\"Aditi M Joshi, Sanjay G Patel\",\"doi\":\"10.5121/cseij.2022.12102\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sanskrit character and number documents have a lot of errors. Correcting those errors using conventional spell-checking approaches breaks down due to the limited vocabulary. This is because of high inflexions of Sanskrit, where words are dynamically formed by Sandhi rules, Samasa rules, Taddhita affixes, etc. Therefore, correcting OCR documents require huge efforts. Here, we can present different machine learning approaches and various ways to improve features for ameliorating the error corrections in Sanskrit documents. Simulation of Sanskrit dictionary for synthesizing off-the-shelf dictionary can be done. Most of the proposed methods can also work for general Sanskrit word corrections and Hindi word corrections. Handwriting recognition in Indic scripts, like Devanagari, is very challenging due to the subtitles in the scripts, variations in rendering and the cursive nature of the handwriting. Lack of public handwriting datasets in Indic scripts has long stymied the development of offline handwritten word recognizers and made comparison across different methods a tedious task in the field. In this paper, a new handwritten word dataset will be released for Devanagari, IIIT-HW-Dev to alleviate some of these issues. This process is required for successful training of deep learning architecture, availability of huge amounts of training data is crucial, as any typical architecture contains millions of parameters. A new method for the classification of freeman chain code using four-connectivity and eight-connectivity events with deep learning approach is presented. Application of CNN LeNet-5 is found to be suitable to get results in this cases as the numbers are formed with curved lines In contrast with the existing FCC event data analysis techniques, sampled grey images of the existing events are not used, but image files of the three-phase PQ event data are analysed by taking the advantage of the success of the deep learning approach on imagefile-classification. Therefore, the novelty of the proposed approach is that image files of the voltage waveforms of the three phases of the power grid are classified. It is shown that the test data can be classified with 100% accuracy. The proposed work is believed to serve the needs of the future smart grid applications, which are fast and taking automatic countermeasures against potential PQ events.\",\"PeriodicalId\":361871,\"journal\":{\"name\":\"Computer Science & Engineering: An International Journal\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Science & Engineering: An International Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5121/cseij.2022.12102\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Science & Engineering: An International Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5121/cseij.2022.12102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

梵文字数文献存在很多错误。由于词汇量有限,使用传统的拼写检查方法来纠正这些错误是行不通的。这是由于梵语的高度灵活性,在梵语中,单词是由连音规则、分音规则、转义词缀等动态形成的。因此,纠正OCR文档需要付出巨大的努力。在这里,我们可以提出不同的机器学习方法和各种方法来改进特征,以改善梵文文档中的错误更正。对梵文词典进行仿真,可以合成现成的词典。大多数建议的方法也可以用于一般的梵语单词更正和印地语单词更正。印度文字的手写识别,比如Devanagari,是非常具有挑战性的,因为脚本中的字幕,呈现的变化和手写的草书性质。长期以来,缺乏公开的印度文字手写数据集一直阻碍着离线手写文字识别器的发展,并使不同方法之间的比较成为该领域一项繁琐的任务。本文将为Devanagari, IIIT-HW-Dev发布一个新的手写词数据集,以缓解这些问题。这个过程是成功训练深度学习架构所必需的,大量训练数据的可用性至关重要,因为任何典型的架构都包含数百万个参数。提出了一种利用四连通性和八连通性事件对freeman链码进行深度学习分类的新方法。与现有的FCC事件数据分析技术相比,我们没有使用现有事件的采样灰度图像,而是利用深度学习方法在图像文件分类上的成功,对三相PQ事件数据的图像文件进行分析。因此,该方法的新颖之处在于对电网三相电压波形的图像文件进行分类。结果表明,该方法对测试数据的分类准确率为100%。所提出的工作被认为是服务于未来智能电网应用的需求,它是快速的,并对潜在的PQ事件采取自动对策。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Ancient Number Recognition using Freeman Chain Code with Deep Learning Approach
Sanskrit character and number documents have a lot of errors. Correcting those errors using conventional spell-checking approaches breaks down due to the limited vocabulary. This is because of high inflexions of Sanskrit, where words are dynamically formed by Sandhi rules, Samasa rules, Taddhita affixes, etc. Therefore, correcting OCR documents require huge efforts. Here, we can present different machine learning approaches and various ways to improve features for ameliorating the error corrections in Sanskrit documents. Simulation of Sanskrit dictionary for synthesizing off-the-shelf dictionary can be done. Most of the proposed methods can also work for general Sanskrit word corrections and Hindi word corrections. Handwriting recognition in Indic scripts, like Devanagari, is very challenging due to the subtitles in the scripts, variations in rendering and the cursive nature of the handwriting. Lack of public handwriting datasets in Indic scripts has long stymied the development of offline handwritten word recognizers and made comparison across different methods a tedious task in the field. In this paper, a new handwritten word dataset will be released for Devanagari, IIIT-HW-Dev to alleviate some of these issues. This process is required for successful training of deep learning architecture, availability of huge amounts of training data is crucial, as any typical architecture contains millions of parameters. A new method for the classification of freeman chain code using four-connectivity and eight-connectivity events with deep learning approach is presented. Application of CNN LeNet-5 is found to be suitable to get results in this cases as the numbers are formed with curved lines In contrast with the existing FCC event data analysis techniques, sampled grey images of the existing events are not used, but image files of the three-phase PQ event data are analysed by taking the advantage of the success of the deep learning approach on imagefile-classification. Therefore, the novelty of the proposed approach is that image files of the voltage waveforms of the three phases of the power grid are classified. It is shown that the test data can be classified with 100% accuracy. The proposed work is believed to serve the needs of the future smart grid applications, which are fast and taking automatic countermeasures against potential PQ events.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信