Yanna Wang, Cunzhao Shi, Baihua Xiao, Chunheng Wang
{"title":"学习空间嵌入判别部分检测器用于场景字符识别","authors":"Yanna Wang, Cunzhao Shi, Baihua Xiao, Chunheng Wang","doi":"10.1109/ICDAR.2017.67","DOIUrl":null,"url":null,"abstract":"Recognizing scene character is extremely challenging due to various interference factors such as character translation, blur and uneven illumination, etc. Considering that characters are composed of a series of parts and different parts attract diverse attentions when people observe a character, we should assign different importance to each part to recognize scene character. In this paper, we propose a discriminative character representation by aggregating the responses of the spatially embedded salient part detectors. Specifically, we first extract the convolution activations from the pre-trained convolutional neural network (CNN). These convolutional activations are considered as the local descriptors of the character parts. Then we learn a set of part detectors and pick the distinctive convolutional activations which respond to the salient parts. Moreover, to alleviate the effect of character translation, rotation and deformation, etc, we assign a response region for each part detector and search the maximal response in this region. Finally, we aggregate the maximal outputs of all the salient part detectors to represent character. The experiments on three datasets show the effectiveness of the proposed method for scene character recognition.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"769 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Learning Spatially Embedded Discriminative Part Detectors for Scene Character Recognition\",\"authors\":\"Yanna Wang, Cunzhao Shi, Baihua Xiao, Chunheng Wang\",\"doi\":\"10.1109/ICDAR.2017.67\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recognizing scene character is extremely challenging due to various interference factors such as character translation, blur and uneven illumination, etc. Considering that characters are composed of a series of parts and different parts attract diverse attentions when people observe a character, we should assign different importance to each part to recognize scene character. In this paper, we propose a discriminative character representation by aggregating the responses of the spatially embedded salient part detectors. Specifically, we first extract the convolution activations from the pre-trained convolutional neural network (CNN). These convolutional activations are considered as the local descriptors of the character parts. Then we learn a set of part detectors and pick the distinctive convolutional activations which respond to the salient parts. Moreover, to alleviate the effect of character translation, rotation and deformation, etc, we assign a response region for each part detector and search the maximal response in this region. Finally, we aggregate the maximal outputs of all the salient part detectors to represent character. The experiments on three datasets show the effectiveness of the proposed method for scene character recognition.\",\"PeriodicalId\":433676,\"journal\":{\"name\":\"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)\",\"volume\":\"769 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.2017.67\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2017.67","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Learning Spatially Embedded Discriminative Part Detectors for Scene Character Recognition
Recognizing scene character is extremely challenging due to various interference factors such as character translation, blur and uneven illumination, etc. Considering that characters are composed of a series of parts and different parts attract diverse attentions when people observe a character, we should assign different importance to each part to recognize scene character. In this paper, we propose a discriminative character representation by aggregating the responses of the spatially embedded salient part detectors. Specifically, we first extract the convolution activations from the pre-trained convolutional neural network (CNN). These convolutional activations are considered as the local descriptors of the character parts. Then we learn a set of part detectors and pick the distinctive convolutional activations which respond to the salient parts. Moreover, to alleviate the effect of character translation, rotation and deformation, etc, we assign a response region for each part detector and search the maximal response in this region. Finally, we aggregate the maximal outputs of all the salient part detectors to represent character. The experiments on three datasets show the effectiveness of the proposed method for scene character recognition.