{"title":"用于视频人脸识别的低分辨率卷积神经网络","authors":"C. Herrmann, D. Willersinn, J. Beyerer","doi":"10.1109/AVSS.2016.7738017","DOIUrl":null,"url":null,"abstract":"Security and safety applications such as surveillance or forensics demand face recognition in low-resolution video data. We propose a face recognition method based on a Convolutional Neural Network (CNN) with a manifold-based track comparison strategy for low-resolution video face recognition. The low-resolution domain is addressed by adjusting the network architecture to prevent bottlenecks or significant upscaling of face images. The CNN is trained with a combination of a large-scale self-collected video face dataset and large-scale public image face datasets resulting in about 1.4M training images. To handle large amounts of video data and for effective comparison, the CNN face descriptors are compared efficiently on track level by local patch means. Our setup achieves 80.3 percent accuracy on a 32×32 pixels low-resolution version of the YouTube Faces Database and outperforms local image descriptors as well as the state-of-the-art VGG-Face network [20] in this domain. The superior performance of the proposed method is confirmed on a self-collected in-the-wild surveillance dataset.","PeriodicalId":438290,"journal":{"name":"2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":"{\"title\":\"Low-resolution Convolutional Neural Networks for video face recognition\",\"authors\":\"C. Herrmann, D. Willersinn, J. Beyerer\",\"doi\":\"10.1109/AVSS.2016.7738017\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Security and safety applications such as surveillance or forensics demand face recognition in low-resolution video data. We propose a face recognition method based on a Convolutional Neural Network (CNN) with a manifold-based track comparison strategy for low-resolution video face recognition. The low-resolution domain is addressed by adjusting the network architecture to prevent bottlenecks or significant upscaling of face images. The CNN is trained with a combination of a large-scale self-collected video face dataset and large-scale public image face datasets resulting in about 1.4M training images. To handle large amounts of video data and for effective comparison, the CNN face descriptors are compared efficiently on track level by local patch means. Our setup achieves 80.3 percent accuracy on a 32×32 pixels low-resolution version of the YouTube Faces Database and outperforms local image descriptors as well as the state-of-the-art VGG-Face network [20] in this domain. The superior performance of the proposed method is confirmed on a self-collected in-the-wild surveillance dataset.\",\"PeriodicalId\":438290,\"journal\":{\"name\":\"2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"35\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AVSS.2016.7738017\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AVSS.2016.7738017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Low-resolution Convolutional Neural Networks for video face recognition
Security and safety applications such as surveillance or forensics demand face recognition in low-resolution video data. We propose a face recognition method based on a Convolutional Neural Network (CNN) with a manifold-based track comparison strategy for low-resolution video face recognition. The low-resolution domain is addressed by adjusting the network architecture to prevent bottlenecks or significant upscaling of face images. The CNN is trained with a combination of a large-scale self-collected video face dataset and large-scale public image face datasets resulting in about 1.4M training images. To handle large amounts of video data and for effective comparison, the CNN face descriptors are compared efficiently on track level by local patch means. Our setup achieves 80.3 percent accuracy on a 32×32 pixels low-resolution version of the YouTube Faces Database and outperforms local image descriptors as well as the state-of-the-art VGG-Face network [20] in this domain. The superior performance of the proposed method is confirmed on a self-collected in-the-wild surveillance dataset.