场景文本识别中的随机投影卷积特征

Rui Wu, Shuli Yang, Dawei Leng, Zhenbo Luo, Yunhong Wang
{"title":"场景文本识别中的随机投影卷积特征","authors":"Rui Wu, Shuli Yang, Dawei Leng, Zhenbo Luo, Yunhong Wang","doi":"10.1109/ICFHR.2016.0036","DOIUrl":null,"url":null,"abstract":"Text recognition in natural scene image is an important yet challenging problem by its irregular nature. A novel method based on random projection and deep neural network(DNN) is proposed in this article. Firstly the word image is converted to multi-layers' convolutional neural network(CNN) feature sequence with sliding window. Then random projection(RP) is used to embed the original high-dimensional feature into a low-dimensional space. Finally, recurrent neural network(RNN) model is trained to recognize the text in word image based on the RP-CNN feature. The benefits of using RP is two-fold. It can preserve the geometrical relationship in dimension reduction, while reduce the computation and storage burden of the following RNN training effectively without much information loss. Moreover, RP brings information diversity with randomness which can improve the generation ability of original feature. Experiments show that recognition performance of RP-CNN feature, with 85% dimension reduction, is similar to the original high-dimensional ones. By ensemble of several RNN models based on various RP-CNN features, we obtain higher performance than single RNN based on original CNN feature. The proposed method shows competitive performance on public datasets such as SVT, ICDAR03, ICDAR13.","PeriodicalId":194844,"journal":{"name":"2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Random Projected Convolutional Feature for Scene Text Recognition\",\"authors\":\"Rui Wu, Shuli Yang, Dawei Leng, Zhenbo Luo, Yunhong Wang\",\"doi\":\"10.1109/ICFHR.2016.0036\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text recognition in natural scene image is an important yet challenging problem by its irregular nature. A novel method based on random projection and deep neural network(DNN) is proposed in this article. Firstly the word image is converted to multi-layers' convolutional neural network(CNN) feature sequence with sliding window. Then random projection(RP) is used to embed the original high-dimensional feature into a low-dimensional space. Finally, recurrent neural network(RNN) model is trained to recognize the text in word image based on the RP-CNN feature. The benefits of using RP is two-fold. It can preserve the geometrical relationship in dimension reduction, while reduce the computation and storage burden of the following RNN training effectively without much information loss. Moreover, RP brings information diversity with randomness which can improve the generation ability of original feature. Experiments show that recognition performance of RP-CNN feature, with 85% dimension reduction, is similar to the original high-dimensional ones. By ensemble of several RNN models based on various RP-CNN features, we obtain higher performance than single RNN based on original CNN feature. The proposed method shows competitive performance on public datasets such as SVT, ICDAR03, ICDAR13.\",\"PeriodicalId\":194844,\"journal\":{\"name\":\"2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICFHR.2016.0036\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFHR.2016.0036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

摘要

自然场景图像中的文本识别因其不规则性而成为一个重要而富有挑战性的问题。本文提出了一种基于随机投影和深度神经网络的新方法。首先将单词图像转换为带滑动窗口的多层卷积神经网络(CNN)特征序列;然后利用随机投影(RP)将原始高维特征嵌入到低维空间中。最后,基于RP-CNN特征,训练递归神经网络(RNN)模型来识别文字图像中的文本。使用RP的好处是双重的。它既能在降维过程中保持几何关系,又能有效减少后续RNN训练的计算量和存储负担,且信息损失不大。此外,RP带来了信息多样性和随机性,提高了原始特征的生成能力。实验表明,RP-CNN特征的识别性能与原始高维特征相似,降维率为85%。通过对基于各种RP-CNN特征的多个RNN模型进行集成,获得了比基于原始CNN特征的单个RNN更高的性能。该方法在SVT、ICDAR03、ICDAR13等公共数据集上表现出具有竞争力的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Random Projected Convolutional Feature for Scene Text Recognition
Text recognition in natural scene image is an important yet challenging problem by its irregular nature. A novel method based on random projection and deep neural network(DNN) is proposed in this article. Firstly the word image is converted to multi-layers' convolutional neural network(CNN) feature sequence with sliding window. Then random projection(RP) is used to embed the original high-dimensional feature into a low-dimensional space. Finally, recurrent neural network(RNN) model is trained to recognize the text in word image based on the RP-CNN feature. The benefits of using RP is two-fold. It can preserve the geometrical relationship in dimension reduction, while reduce the computation and storage burden of the following RNN training effectively without much information loss. Moreover, RP brings information diversity with randomness which can improve the generation ability of original feature. Experiments show that recognition performance of RP-CNN feature, with 85% dimension reduction, is similar to the original high-dimensional ones. By ensemble of several RNN models based on various RP-CNN features, we obtain higher performance than single RNN based on original CNN feature. The proposed method shows competitive performance on public datasets such as SVT, ICDAR03, ICDAR13.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信