Rui Wu, Shuli Yang, Dawei Leng, Zhenbo Luo, Yunhong Wang
{"title":"场景文本识别中的随机投影卷积特征","authors":"Rui Wu, Shuli Yang, Dawei Leng, Zhenbo Luo, Yunhong Wang","doi":"10.1109/ICFHR.2016.0036","DOIUrl":null,"url":null,"abstract":"Text recognition in natural scene image is an important yet challenging problem by its irregular nature. A novel method based on random projection and deep neural network(DNN) is proposed in this article. Firstly the word image is converted to multi-layers' convolutional neural network(CNN) feature sequence with sliding window. Then random projection(RP) is used to embed the original high-dimensional feature into a low-dimensional space. Finally, recurrent neural network(RNN) model is trained to recognize the text in word image based on the RP-CNN feature. The benefits of using RP is two-fold. It can preserve the geometrical relationship in dimension reduction, while reduce the computation and storage burden of the following RNN training effectively without much information loss. Moreover, RP brings information diversity with randomness which can improve the generation ability of original feature. Experiments show that recognition performance of RP-CNN feature, with 85% dimension reduction, is similar to the original high-dimensional ones. By ensemble of several RNN models based on various RP-CNN features, we obtain higher performance than single RNN based on original CNN feature. The proposed method shows competitive performance on public datasets such as SVT, ICDAR03, ICDAR13.","PeriodicalId":194844,"journal":{"name":"2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Random Projected Convolutional Feature for Scene Text Recognition\",\"authors\":\"Rui Wu, Shuli Yang, Dawei Leng, Zhenbo Luo, Yunhong Wang\",\"doi\":\"10.1109/ICFHR.2016.0036\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text recognition in natural scene image is an important yet challenging problem by its irregular nature. A novel method based on random projection and deep neural network(DNN) is proposed in this article. Firstly the word image is converted to multi-layers' convolutional neural network(CNN) feature sequence with sliding window. Then random projection(RP) is used to embed the original high-dimensional feature into a low-dimensional space. Finally, recurrent neural network(RNN) model is trained to recognize the text in word image based on the RP-CNN feature. The benefits of using RP is two-fold. It can preserve the geometrical relationship in dimension reduction, while reduce the computation and storage burden of the following RNN training effectively without much information loss. Moreover, RP brings information diversity with randomness which can improve the generation ability of original feature. Experiments show that recognition performance of RP-CNN feature, with 85% dimension reduction, is similar to the original high-dimensional ones. By ensemble of several RNN models based on various RP-CNN features, we obtain higher performance than single RNN based on original CNN feature. The proposed method shows competitive performance on public datasets such as SVT, ICDAR03, ICDAR13.\",\"PeriodicalId\":194844,\"journal\":{\"name\":\"2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICFHR.2016.0036\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFHR.2016.0036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Random Projected Convolutional Feature for Scene Text Recognition
Text recognition in natural scene image is an important yet challenging problem by its irregular nature. A novel method based on random projection and deep neural network(DNN) is proposed in this article. Firstly the word image is converted to multi-layers' convolutional neural network(CNN) feature sequence with sliding window. Then random projection(RP) is used to embed the original high-dimensional feature into a low-dimensional space. Finally, recurrent neural network(RNN) model is trained to recognize the text in word image based on the RP-CNN feature. The benefits of using RP is two-fold. It can preserve the geometrical relationship in dimension reduction, while reduce the computation and storage burden of the following RNN training effectively without much information loss. Moreover, RP brings information diversity with randomness which can improve the generation ability of original feature. Experiments show that recognition performance of RP-CNN feature, with 85% dimension reduction, is similar to the original high-dimensional ones. By ensemble of several RNN models based on various RP-CNN features, we obtain higher performance than single RNN based on original CNN feature. The proposed method shows competitive performance on public datasets such as SVT, ICDAR03, ICDAR13.