基于字符校正和特征词分类的OCR纠错

2016 12th IAPR Workshop on Document Analysis Systems (DAS) Pub Date : 2016-04-11 DOI:10.1109/DAS.2016.44

Ido Kissos, N. Dershowitz

{"title":"基于字符校正和特征词分类的OCR纠错","authors":"Ido Kissos, N. Dershowitz","doi":"10.1109/DAS.2016.44","DOIUrl":null,"url":null,"abstract":"This paper explores the use of a learned classifier for post-OCR text correction. Experiments with the Arabic language show that this approach, which integrates a weighted confusion matrix and a shallow language model, improves the vast majority of segmentation and recognition errors, the most frequent types of error on our dataset.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"67","resultStr":"{\"title\":\"OCR Error Correction Using Character Correction and Feature-Based Word Classification\",\"authors\":\"Ido Kissos, N. Dershowitz\",\"doi\":\"10.1109/DAS.2016.44\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper explores the use of a learned classifier for post-OCR text correction. Experiments with the Arabic language show that this approach, which integrates a weighted confusion matrix and a shallow language model, improves the vast majority of segmentation and recognition errors, the most frequent types of error on our dataset.\",\"PeriodicalId\":197359,\"journal\":{\"name\":\"2016 12th IAPR Workshop on Document Analysis Systems (DAS)\",\"volume\":\"47 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"67\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 12th IAPR Workshop on Document Analysis Systems (DAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DAS.2016.44\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DAS.2016.44","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 67

摘要

本文探讨了使用学习分类器进行后ocr文本校正。阿拉伯语的实验表明，这种方法集成了加权混淆矩阵和浅语言模型，改善了绝大多数分割和识别错误，这是我们数据集中最常见的错误类型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

OCR Error Correction Using Character Correction and Feature-Based Word Classification

This paper explores the use of a learned classifier for post-OCR text correction. Experiments with the Arabic language show that this approach, which integrates a weighted confusion matrix and a shallow language model, improves the vast majority of segmentation and recognition errors, the most frequent types of error on our dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 12th IAPR Workshop on Document Analysis Systems (DAS)

自引率

0.00%

发文量