场景文本识别中的随机投影卷积特征

2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR) Pub Date : 2016-10-01 DOI:10.1109/ICFHR.2016.0036

Rui Wu, Shuli Yang, Dawei Leng, Zhenbo Luo, Yunhong Wang

{"title":"场景文本识别中的随机投影卷积特征","authors":"Rui Wu, Shuli Yang, Dawei Leng, Zhenbo Luo, Yunhong Wang","doi":"10.1109/ICFHR.2016.0036","DOIUrl":null,"url":null,"abstract":"Text recognition in natural scene image is an important yet challenging problem by its irregular nature. A novel method based on random projection and deep neural network(DNN) is proposed in this article. Firstly the word image is converted to multi-layers' convolutional neural network(CNN) feature sequence with sliding window. Then random projection(RP) is used to embed the original high-dimensional feature into a low-dimensional space. Finally, recurrent neural network(RNN) model is trained to recognize the text in word image based on the RP-CNN feature. The benefits of using RP is two-fold. It can preserve the geometrical relationship in dimension reduction, while reduce the computation and storage burden of the following RNN training effectively without much information loss. Moreover, RP brings information diversity with randomness which can improve the generation ability of original feature. Experiments show that recognition performance of RP-CNN feature, with 85% dimension reduction, is similar to the original high-dimensional ones. By ensemble of several RNN models based on various RP-CNN features, we obtain higher performance than single RNN based on original CNN feature. The proposed method shows competitive performance on public datasets such as SVT, ICDAR03, ICDAR13.","PeriodicalId":194844,"journal":{"name":"2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Random Projected Convolutional Feature for Scene Text Recognition\",\"authors\":\"Rui Wu, Shuli Yang, Dawei Leng, Zhenbo Luo, Yunhong Wang\",\"doi\":\"10.1109/ICFHR.2016.0036\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text recognition in natural scene image is an important yet challenging problem by its irregular nature. A novel method based on random projection and deep neural network(DNN) is proposed in this article. Firstly the word image is converted to multi-layers' convolutional neural network(CNN) feature sequence with sliding window. Then random projection(RP) is used to embed the original high-dimensional feature into a low-dimensional space. Finally, recurrent neural network(RNN) model is trained to recognize the text in word image based on the RP-CNN feature. The benefits of using RP is two-fold. It can preserve the geometrical relationship in dimension reduction, while reduce the computation and storage burden of the following RNN training effectively without much information loss. Moreover, RP brings information diversity with randomness which can improve the generation ability of original feature. Experiments show that recognition performance of RP-CNN feature, with 85% dimension reduction, is similar to the original high-dimensional ones. By ensemble of several RNN models based on various RP-CNN features, we obtain higher performance than single RNN based on original CNN feature. The proposed method shows competitive performance on public datasets such as SVT, ICDAR03, ICDAR13.\",\"PeriodicalId\":194844,\"journal\":{\"name\":\"2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICFHR.2016.0036\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFHR.2016.0036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

自然场景图像中的文本识别因其不规则性而成为一个重要而富有挑战性的问题。本文提出了一种基于随机投影和深度神经网络的新方法。首先将单词图像转换为带滑动窗口的多层卷积神经网络(CNN)特征序列;然后利用随机投影(RP)将原始高维特征嵌入到低维空间中。最后，基于RP-CNN特征，训练递归神经网络(RNN)模型来识别文字图像中的文本。使用RP的好处是双重的。它既能在降维过程中保持几何关系，又能有效减少后续RNN训练的计算量和存储负担，且信息损失不大。此外，RP带来了信息多样性和随机性，提高了原始特征的生成能力。实验表明，RP-CNN特征的识别性能与原始高维特征相似，降维率为85%。通过对基于各种RP-CNN特征的多个RNN模型进行集成，获得了比基于原始CNN特征的单个RNN更高的性能。该方法在SVT、ICDAR03、ICDAR13等公共数据集上表现出具有竞争力的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Random Projected Convolutional Feature for Scene Text Recognition

Text recognition in natural scene image is an important yet challenging problem by its irregular nature. A novel method based on random projection and deep neural network(DNN) is proposed in this article. Firstly the word image is converted to multi-layers' convolutional neural network(CNN) feature sequence with sliding window. Then random projection(RP) is used to embed the original high-dimensional feature into a low-dimensional space. Finally, recurrent neural network(RNN) model is trained to recognize the text in word image based on the RP-CNN feature. The benefits of using RP is two-fold. It can preserve the geometrical relationship in dimension reduction, while reduce the computation and storage burden of the following RNN training effectively without much information loss. Moreover, RP brings information diversity with randomness which can improve the generation ability of original feature. Experiments show that recognition performance of RP-CNN feature, with 85% dimension reduction, is similar to the original high-dimensional ones. By ensemble of several RNN models based on various RP-CNN features, we obtain higher performance than single RNN based on original CNN feature. The proposed method shows competitive performance on public datasets such as SVT, ICDAR03, ICDAR13.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)

自引率

0.00%

发文量