Script Identification from Handwritten Document

K. Roy, S. K. Das, S. Obaidullah
{"title":"Script Identification from Handwritten Document","authors":"K. Roy, S. K. Das, S. Obaidullah","doi":"10.1109/NCVPRIPG.2011.22","DOIUrl":null,"url":null,"abstract":"Every country has their own language and script. This may or may not common to other countries. To communicate with each other we need to have a common language. English is the language that is performing that role. So most of the countries (other than Roman) use bi-script documents. But for countries like India where we have a total of 12 official scripts (and 22 languages) things are more complex. So to have an OCR we need to identify the script by which the script the document is written (even the document is not itself multi-script). Postal document, pre-printed forms are good example of such documents. So identification of the script from a document may be written with any of these 13 scripts is a very challenging work. In this paper we have tried to identify scripts written by any of the 6 official languages of India. Here we have used very simple and efficient feature at component level for the same. Using Fractal-based features, component based feature and Topological features, series of classifiers were used. Overall accuracy of the proposed system is at present 89.48% on the test set without rejection.","PeriodicalId":285162,"journal":{"name":"2011 Third National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Third National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCVPRIPG.2011.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 28

Abstract

Every country has their own language and script. This may or may not common to other countries. To communicate with each other we need to have a common language. English is the language that is performing that role. So most of the countries (other than Roman) use bi-script documents. But for countries like India where we have a total of 12 official scripts (and 22 languages) things are more complex. So to have an OCR we need to identify the script by which the script the document is written (even the document is not itself multi-script). Postal document, pre-printed forms are good example of such documents. So identification of the script from a document may be written with any of these 13 scripts is a very challenging work. In this paper we have tried to identify scripts written by any of the 6 official languages of India. Here we have used very simple and efficient feature at component level for the same. Using Fractal-based features, component based feature and Topological features, series of classifiers were used. Overall accuracy of the proposed system is at present 89.48% on the test set without rejection.
手写体文件的文字识别
每个国家都有自己的语言和文字。这在其他国家可能很常见,也可能不常见。为了相互交流,我们需要有一种共同的语言。英语就是扮演这个角色的语言。所以大多数国家(除了罗马)使用双脚本文件。但对于像印度这样的国家,我们总共有12种官方文字(22种语言),事情就复杂多了。因此,要使用OCR,我们需要识别用于编写文档的脚本的脚本(甚至文档本身也不是多脚本)。邮政文件、预印表格都是这类文件的好例子。因此,识别文档中的脚本可能是用这13个脚本中的任何一个编写的,这是一项非常具有挑战性的工作。在本文中,我们试图识别印度6种官方语言中的任何一种书写的文字。在这里,我们在组件级别使用了非常简单和有效的功能。基于分形特征、基于分量特征和拓扑特征,使用了一系列分类器。目前,该系统在无拒绝的测试集上的总体准确率为89.48%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信