一种高效的印刷文件双语光学字符识别系统

2009 Seventh International Conference on Advances in Pattern Recognition Pub Date : 2009-02-04 DOI:10.1109/ICAPR.2009.49

S. Mohanty, Himadri Nandini Dasbebartta, Tarun Kumar Behera

{"title":"一种高效的印刷文件双语光学字符识别系统","authors":"S. Mohanty, Himadri Nandini Dasbebartta, Tarun Kumar Behera","doi":"10.1109/ICAPR.2009.49","DOIUrl":null,"url":null,"abstract":"Recognition of documents containing multiscripts is really a challenging task, which needs more effort of the OCR (Optical Character Recognition) designers for improving the accuracy rate. Previously OCR was developed for documents with single scripts only mainly for English and regional languages. Old documents of not only uniscripts but also multiscripts is needed to be preserved for future use. This paper describes the character recognition process for printed documents containing English and Oriya texts. Though the languages in India are different but still we can find some common features among them. In consideration to our paper we need to distinguish between the Roman Script and the Oriya Script. Most of the English that is. Roman Script are linear as well as circular in nature and the Oriya characters are circular in nature. So we need to separate these scripts by taking into consideration of their features paragraph wise or line wise.","PeriodicalId":443926,"journal":{"name":"2009 Seventh International Conference on Advances in Pattern Recognition","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"An Efficient Bilingual Optical Character Recognition (English-Oriya) System for Printed Documents\",\"authors\":\"S. Mohanty, Himadri Nandini Dasbebartta, Tarun Kumar Behera\",\"doi\":\"10.1109/ICAPR.2009.49\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recognition of documents containing multiscripts is really a challenging task, which needs more effort of the OCR (Optical Character Recognition) designers for improving the accuracy rate. Previously OCR was developed for documents with single scripts only mainly for English and regional languages. Old documents of not only uniscripts but also multiscripts is needed to be preserved for future use. This paper describes the character recognition process for printed documents containing English and Oriya texts. Though the languages in India are different but still we can find some common features among them. In consideration to our paper we need to distinguish between the Roman Script and the Oriya Script. Most of the English that is. Roman Script are linear as well as circular in nature and the Oriya characters are circular in nature. So we need to separate these scripts by taking into consideration of their features paragraph wise or line wise.\",\"PeriodicalId\":443926,\"journal\":{\"name\":\"2009 Seventh International Conference on Advances in Pattern Recognition\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-02-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 Seventh International Conference on Advances in Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAPR.2009.49\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Seventh International Conference on Advances in Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAPR.2009.49","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

摘要

多文字文档的识别是一项具有挑战性的任务，需要OCR(光学字符识别)设计者付出更多的努力来提高识别的准确率。以前，OCR主要是为英语和地区语言的单一脚本文档开发的。不仅需要保存单脚本的旧文档，而且需要保存多脚本的旧文档以备将来使用。本文描述了包含英语和奥里亚文本的打印文档的字符识别过程。虽然印度的语言各不相同，但我们仍然可以在它们之间找到一些共同的特征。考虑到我们的论文，我们需要区分罗马文字和奥里亚文字。那就是大部分的英语。罗马文字是线性的，也是圆形的，奥里亚文字也是圆形的。因此，我们需要通过考虑它们的特征来区分这些脚本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Efficient Bilingual Optical Character Recognition (English-Oriya) System for Printed Documents

Recognition of documents containing multiscripts is really a challenging task, which needs more effort of the OCR (Optical Character Recognition) designers for improving the accuracy rate. Previously OCR was developed for documents with single scripts only mainly for English and regional languages. Old documents of not only uniscripts but also multiscripts is needed to be preserved for future use. This paper describes the character recognition process for printed documents containing English and Oriya texts. Though the languages in India are different but still we can find some common features among them. In consideration to our paper we need to distinguish between the Roman Script and the Oriya Script. Most of the English that is. Roman Script are linear as well as circular in nature and the Oriya characters are circular in nature. So we need to separate these scripts by taking into consideration of their features paragraph wise or line wise.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2009 Seventh International Conference on Advances in Pattern Recognition

自引率

0.00%

发文量