S. Mohanty, Himadri Nandini Dasbebartta, Tarun Kumar Behera
{"title":"一种高效的印刷文件双语光学字符识别系统","authors":"S. Mohanty, Himadri Nandini Dasbebartta, Tarun Kumar Behera","doi":"10.1109/ICAPR.2009.49","DOIUrl":null,"url":null,"abstract":"Recognition of documents containing multiscripts is really a challenging task, which needs more effort of the OCR (Optical Character Recognition) designers for improving the accuracy rate. Previously OCR was developed for documents with single scripts only mainly for English and regional languages. Old documents of not only uniscripts but also multiscripts is needed to be preserved for future use. This paper describes the character recognition process for printed documents containing English and Oriya texts. Though the languages in India are different but still we can find some common features among them. In consideration to our paper we need to distinguish between the Roman Script and the Oriya Script. Most of the English that is. Roman Script are linear as well as circular in nature and the Oriya characters are circular in nature. So we need to separate these scripts by taking into consideration of their features paragraph wise or line wise.","PeriodicalId":443926,"journal":{"name":"2009 Seventh International Conference on Advances in Pattern Recognition","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"An Efficient Bilingual Optical Character Recognition (English-Oriya) System for Printed Documents\",\"authors\":\"S. Mohanty, Himadri Nandini Dasbebartta, Tarun Kumar Behera\",\"doi\":\"10.1109/ICAPR.2009.49\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recognition of documents containing multiscripts is really a challenging task, which needs more effort of the OCR (Optical Character Recognition) designers for improving the accuracy rate. Previously OCR was developed for documents with single scripts only mainly for English and regional languages. Old documents of not only uniscripts but also multiscripts is needed to be preserved for future use. This paper describes the character recognition process for printed documents containing English and Oriya texts. Though the languages in India are different but still we can find some common features among them. In consideration to our paper we need to distinguish between the Roman Script and the Oriya Script. Most of the English that is. Roman Script are linear as well as circular in nature and the Oriya characters are circular in nature. So we need to separate these scripts by taking into consideration of their features paragraph wise or line wise.\",\"PeriodicalId\":443926,\"journal\":{\"name\":\"2009 Seventh International Conference on Advances in Pattern Recognition\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-02-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 Seventh International Conference on Advances in Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAPR.2009.49\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Seventh International Conference on Advances in Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAPR.2009.49","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Efficient Bilingual Optical Character Recognition (English-Oriya) System for Printed Documents
Recognition of documents containing multiscripts is really a challenging task, which needs more effort of the OCR (Optical Character Recognition) designers for improving the accuracy rate. Previously OCR was developed for documents with single scripts only mainly for English and regional languages. Old documents of not only uniscripts but also multiscripts is needed to be preserved for future use. This paper describes the character recognition process for printed documents containing English and Oriya texts. Though the languages in India are different but still we can find some common features among them. In consideration to our paper we need to distinguish between the Roman Script and the Oriya Script. Most of the English that is. Roman Script are linear as well as circular in nature and the Oriya characters are circular in nature. So we need to separate these scripts by taking into consideration of their features paragraph wise or line wise.