一种高效的印刷文件双语光学字符识别系统

S. Mohanty, Himadri Nandini Dasbebartta, Tarun Kumar Behera
{"title":"一种高效的印刷文件双语光学字符识别系统","authors":"S. Mohanty, Himadri Nandini Dasbebartta, Tarun Kumar Behera","doi":"10.1109/ICAPR.2009.49","DOIUrl":null,"url":null,"abstract":"Recognition of documents containing multiscripts is really a challenging task, which needs more effort of the OCR (Optical Character Recognition) designers for improving the accuracy rate. Previously OCR was developed for documents with single scripts only mainly for English and regional languages. Old documents of not only uniscripts but also multiscripts is needed to be preserved for future use. This paper describes the character recognition process for printed documents containing English and Oriya texts. Though the languages in India are different but still we can find some common features among them. In consideration to our paper we need to distinguish between the Roman Script and the Oriya Script. Most of the English that is. Roman Script are linear as well as circular in nature and the Oriya characters are circular in nature. So we need to separate these scripts by taking into consideration of their features paragraph wise or line wise.","PeriodicalId":443926,"journal":{"name":"2009 Seventh International Conference on Advances in Pattern Recognition","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"An Efficient Bilingual Optical Character Recognition (English-Oriya) System for Printed Documents\",\"authors\":\"S. Mohanty, Himadri Nandini Dasbebartta, Tarun Kumar Behera\",\"doi\":\"10.1109/ICAPR.2009.49\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recognition of documents containing multiscripts is really a challenging task, which needs more effort of the OCR (Optical Character Recognition) designers for improving the accuracy rate. Previously OCR was developed for documents with single scripts only mainly for English and regional languages. Old documents of not only uniscripts but also multiscripts is needed to be preserved for future use. This paper describes the character recognition process for printed documents containing English and Oriya texts. Though the languages in India are different but still we can find some common features among them. In consideration to our paper we need to distinguish between the Roman Script and the Oriya Script. Most of the English that is. Roman Script are linear as well as circular in nature and the Oriya characters are circular in nature. So we need to separate these scripts by taking into consideration of their features paragraph wise or line wise.\",\"PeriodicalId\":443926,\"journal\":{\"name\":\"2009 Seventh International Conference on Advances in Pattern Recognition\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-02-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 Seventh International Conference on Advances in Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAPR.2009.49\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Seventh International Conference on Advances in Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAPR.2009.49","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22

摘要

多文字文档的识别是一项具有挑战性的任务,需要OCR(光学字符识别)设计者付出更多的努力来提高识别的准确率。以前,OCR主要是为英语和地区语言的单一脚本文档开发的。不仅需要保存单脚本的旧文档,而且需要保存多脚本的旧文档以备将来使用。本文描述了包含英语和奥里亚文本的打印文档的字符识别过程。虽然印度的语言各不相同,但我们仍然可以在它们之间找到一些共同的特征。考虑到我们的论文,我们需要区分罗马文字和奥里亚文字。那就是大部分的英语。罗马文字是线性的,也是圆形的,奥里亚文字也是圆形的。因此,我们需要通过考虑它们的特征来区分这些脚本。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Efficient Bilingual Optical Character Recognition (English-Oriya) System for Printed Documents
Recognition of documents containing multiscripts is really a challenging task, which needs more effort of the OCR (Optical Character Recognition) designers for improving the accuracy rate. Previously OCR was developed for documents with single scripts only mainly for English and regional languages. Old documents of not only uniscripts but also multiscripts is needed to be preserved for future use. This paper describes the character recognition process for printed documents containing English and Oriya texts. Though the languages in India are different but still we can find some common features among them. In consideration to our paper we need to distinguish between the Roman Script and the Oriya Script. Most of the English that is. Roman Script are linear as well as circular in nature and the Oriya characters are circular in nature. So we need to separate these scripts by taking into consideration of their features paragraph wise or line wise.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信