基于深度CNN-RNN混合网络的乌尔都语无约束OCR

Mohit Jain, Minesh Mathew, C. V. Jawahar
{"title":"基于深度CNN-RNN混合网络的乌尔都语无约束OCR","authors":"Mohit Jain, Minesh Mathew, C. V. Jawahar","doi":"10.1109/ACPR.2017.5","DOIUrl":null,"url":null,"abstract":"Building robust text recognition systems for languages with cursive scripts like Urdu has always been challenging. Intricacies of the script and the absence of ample annotated data further act as adversaries to this task. We demonstrate the effectiveness of an end-to-end trainable hybrid CNN-RNN architecture in recognizing Urdu text from printed documents, typically known as Urdu OCR. The solution proposed is not bounded by any language specific lexicon with the model following a segmentation-free, sequence-tosequence transcription approach. The network transcribes a sequence of convolutional features from an input image to a sequence of target labels. This discards the need to segment the input image into its constituent characters/glyphs, which is often arduous for scripts like Urdu. Furthermore, past and future contexts modelled by bidirectional recurrent layers aids the transcription. We outperform previous state-of-theart techniques on the synthetic UPTI dataset. Additionally, we publish a new dataset curated by scanning printed Urdu publications in various writing styles and fonts, annotated at the line level. We also provide benchmark results of our model on this dataset","PeriodicalId":426561,"journal":{"name":"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)","volume":"1106 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Unconstrained OCR for Urdu Using Deep CNN-RNN Hybrid Networks\",\"authors\":\"Mohit Jain, Minesh Mathew, C. V. Jawahar\",\"doi\":\"10.1109/ACPR.2017.5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Building robust text recognition systems for languages with cursive scripts like Urdu has always been challenging. Intricacies of the script and the absence of ample annotated data further act as adversaries to this task. We demonstrate the effectiveness of an end-to-end trainable hybrid CNN-RNN architecture in recognizing Urdu text from printed documents, typically known as Urdu OCR. The solution proposed is not bounded by any language specific lexicon with the model following a segmentation-free, sequence-tosequence transcription approach. The network transcribes a sequence of convolutional features from an input image to a sequence of target labels. This discards the need to segment the input image into its constituent characters/glyphs, which is often arduous for scripts like Urdu. Furthermore, past and future contexts modelled by bidirectional recurrent layers aids the transcription. We outperform previous state-of-theart techniques on the synthetic UPTI dataset. Additionally, we publish a new dataset curated by scanning printed Urdu publications in various writing styles and fonts, annotated at the line level. We also provide benchmark results of our model on this dataset\",\"PeriodicalId\":426561,\"journal\":{\"name\":\"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)\",\"volume\":\"1106 \",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACPR.2017.5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACPR.2017.5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19

摘要

为乌尔都语等草书语言构建强大的文本识别系统一直是一项挑战。脚本的复杂性和缺乏充足的注释数据进一步阻碍了这项任务。我们展示了端到端可训练的混合CNN-RNN架构在从打印文档中识别乌尔都语文本方面的有效性,通常称为乌尔都语OCR。所提出的解决方案不受任何语言特定词典的限制,其模型遵循无分割,序列到序列转录方法。该网络将输入图像的卷积特征序列转录为目标标签序列。这样就不需要将输入图像分割成其组成字符/符号,而这对于乌尔都语等脚本来说通常是很困难的。此外,由双向循环层模拟的过去和未来环境有助于转录。我们在合成UPTI数据集上优于以前的最先进技术。此外,我们发布了一个新的数据集,通过扫描各种书写风格和字体的印刷乌尔都语出版物,在行级别进行注释。我们还在该数据集上提供了我们的模型的基准测试结果
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Unconstrained OCR for Urdu Using Deep CNN-RNN Hybrid Networks
Building robust text recognition systems for languages with cursive scripts like Urdu has always been challenging. Intricacies of the script and the absence of ample annotated data further act as adversaries to this task. We demonstrate the effectiveness of an end-to-end trainable hybrid CNN-RNN architecture in recognizing Urdu text from printed documents, typically known as Urdu OCR. The solution proposed is not bounded by any language specific lexicon with the model following a segmentation-free, sequence-tosequence transcription approach. The network transcribes a sequence of convolutional features from an input image to a sequence of target labels. This discards the need to segment the input image into its constituent characters/glyphs, which is often arduous for scripts like Urdu. Furthermore, past and future contexts modelled by bidirectional recurrent layers aids the transcription. We outperform previous state-of-theart techniques on the synthetic UPTI dataset. Additionally, we publish a new dataset curated by scanning printed Urdu publications in various writing styles and fonts, annotated at the line level. We also provide benchmark results of our model on this dataset
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信