基于深度CNN-RNN混合网络的乌尔都语无约束OCR

2017 4th IAPR Asian Conference on Pattern Recognition (ACPR) Pub Date : 2017-11-01 DOI:10.1109/ACPR.2017.5

Mohit Jain, Minesh Mathew, C. V. Jawahar

{"title":"基于深度CNN-RNN混合网络的乌尔都语无约束OCR","authors":"Mohit Jain, Minesh Mathew, C. V. Jawahar","doi":"10.1109/ACPR.2017.5","DOIUrl":null,"url":null,"abstract":"Building robust text recognition systems for languages with cursive scripts like Urdu has always been challenging. Intricacies of the script and the absence of ample annotated data further act as adversaries to this task. We demonstrate the effectiveness of an end-to-end trainable hybrid CNN-RNN architecture in recognizing Urdu text from printed documents, typically known as Urdu OCR. The solution proposed is not bounded by any language specific lexicon with the model following a segmentation-free, sequence-tosequence transcription approach. The network transcribes a sequence of convolutional features from an input image to a sequence of target labels. This discards the need to segment the input image into its constituent characters/glyphs, which is often arduous for scripts like Urdu. Furthermore, past and future contexts modelled by bidirectional recurrent layers aids the transcription. We outperform previous state-of-theart techniques on the synthetic UPTI dataset. Additionally, we publish a new dataset curated by scanning printed Urdu publications in various writing styles and fonts, annotated at the line level. We also provide benchmark results of our model on this dataset","PeriodicalId":426561,"journal":{"name":"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"1106 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Unconstrained OCR for Urdu Using Deep CNN-RNN Hybrid Networks\",\"authors\":\"Mohit Jain, Minesh Mathew, C. V. Jawahar\",\"doi\":\"10.1109/ACPR.2017.5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Building robust text recognition systems for languages with cursive scripts like Urdu has always been challenging. Intricacies of the script and the absence of ample annotated data further act as adversaries to this task. We demonstrate the effectiveness of an end-to-end trainable hybrid CNN-RNN architecture in recognizing Urdu text from printed documents, typically known as Urdu OCR. The solution proposed is not bounded by any language specific lexicon with the model following a segmentation-free, sequence-tosequence transcription approach. The network transcribes a sequence of convolutional features from an input image to a sequence of target labels. This discards the need to segment the input image into its constituent characters/glyphs, which is often arduous for scripts like Urdu. Furthermore, past and future contexts modelled by bidirectional recurrent layers aids the transcription. We outperform previous state-of-theart techniques on the synthetic UPTI dataset. Additionally, we publish a new dataset curated by scanning printed Urdu publications in various writing styles and fonts, annotated at the line level. We also provide benchmark results of our model on this dataset\",\"PeriodicalId\":426561,\"journal\":{\"name\":\"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)\",\"volume\":\"1106 \",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACPR.2017.5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACPR.2017.5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

摘要

为乌尔都语等草书语言构建强大的文本识别系统一直是一项挑战。脚本的复杂性和缺乏充足的注释数据进一步阻碍了这项任务。我们展示了端到端可训练的混合CNN-RNN架构在从打印文档中识别乌尔都语文本方面的有效性，通常称为乌尔都语OCR。所提出的解决方案不受任何语言特定词典的限制，其模型遵循无分割，序列到序列转录方法。该网络将输入图像的卷积特征序列转录为目标标签序列。这样就不需要将输入图像分割成其组成字符/符号，而这对于乌尔都语等脚本来说通常是很困难的。此外，由双向循环层模拟的过去和未来环境有助于转录。我们在合成UPTI数据集上优于以前的最先进技术。此外，我们发布了一个新的数据集，通过扫描各种书写风格和字体的印刷乌尔都语出版物，在行级别进行注释。我们还在该数据集上提供了我们的模型的基准测试结果

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Unconstrained OCR for Urdu Using Deep CNN-RNN Hybrid Networks

Building robust text recognition systems for languages with cursive scripts like Urdu has always been challenging. Intricacies of the script and the absence of ample annotated data further act as adversaries to this task. We demonstrate the effectiveness of an end-to-end trainable hybrid CNN-RNN architecture in recognizing Urdu text from printed documents, typically known as Urdu OCR. The solution proposed is not bounded by any language specific lexicon with the model following a segmentation-free, sequence-tosequence transcription approach. The network transcribes a sequence of convolutional features from an input image to a sequence of target labels. This discards the need to segment the input image into its constituent characters/glyphs, which is often arduous for scripts like Urdu. Furthermore, past and future contexts modelled by bidirectional recurrent layers aids the transcription. We outperform previous state-of-theart techniques on the synthetic UPTI dataset. Additionally, we publish a new dataset curated by scanning printed Urdu publications in various writing styles and fonts, annotated at the line level. We also provide benchmark results of our model on this dataset

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)

自引率

0.00%

发文量