基于多路径区域的卷积神经网络无约束“硬脸”精确检测

2017 14th Conference on Computer and Robot Vision (CRV) Pub Date : 2017-03-27 DOI:10.1109/CRV.2017.20

Yuguang Liu, M. Levine

{"title":"基于多路径区域的卷积神经网络无约束“硬脸”精确检测","authors":"Yuguang Liu, M. Levine","doi":"10.1109/CRV.2017.20","DOIUrl":null,"url":null,"abstract":"Large-scale variations still pose a challenge in unconstrained face detection. To the best of our knowledge, no current face detection algorithm can detect a face as large as 800 x 800 pixels while simultaneously detecting another one as small as 8 x 8 pixels within a single image with equally high accuracy. We propose a two-stage cascaded face detection framework, Multi-Path Region-based Convolutional Neural Network (MP-RCNN), that seamlessly combines a deep neural network with a classic learning strategy, to tackle this challenge. The first stage is a Multi-Path Region Proposal Network (MP-RPN) that proposes faces at three different scales. It simultaneously utilizes three parallel outputs of the convolutional feature maps to predict multi-scale candidate face regions. The \"atrous\" convolution trick (convolution with up-sampled filters) and a newly proposed sampling layer for \"hard\" examples are embedded in MP-RPN to further boost its performance. The second stage is a Boosted Forests classifier, which utilizes deep facial features pooled from inside the candidate face regions as well as deep contextual features pooled from a larger region surrounding the candidate face regions. This step is included to further remove hard negative samples. Experiments show that this approach achieves state-of-the-art face detection performance on the WIDER FACE dataset \"hard\" partition, outperforming the former best result by 9.6% for the Average Precision.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Multi-path Region-Based Convolutional Neural Network for Accurate Detection of Unconstrained \\\"Hard Faces\\\"\",\"authors\":\"Yuguang Liu, M. Levine\",\"doi\":\"10.1109/CRV.2017.20\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large-scale variations still pose a challenge in unconstrained face detection. To the best of our knowledge, no current face detection algorithm can detect a face as large as 800 x 800 pixels while simultaneously detecting another one as small as 8 x 8 pixels within a single image with equally high accuracy. We propose a two-stage cascaded face detection framework, Multi-Path Region-based Convolutional Neural Network (MP-RCNN), that seamlessly combines a deep neural network with a classic learning strategy, to tackle this challenge. The first stage is a Multi-Path Region Proposal Network (MP-RPN) that proposes faces at three different scales. It simultaneously utilizes three parallel outputs of the convolutional feature maps to predict multi-scale candidate face regions. The \\\"atrous\\\" convolution trick (convolution with up-sampled filters) and a newly proposed sampling layer for \\\"hard\\\" examples are embedded in MP-RPN to further boost its performance. The second stage is a Boosted Forests classifier, which utilizes deep facial features pooled from inside the candidate face regions as well as deep contextual features pooled from a larger region surrounding the candidate face regions. This step is included to further remove hard negative samples. Experiments show that this approach achieves state-of-the-art face detection performance on the WIDER FACE dataset \\\"hard\\\" partition, outperforming the former best result by 9.6% for the Average Precision.\",\"PeriodicalId\":308760,\"journal\":{\"name\":\"2017 14th Conference on Computer and Robot Vision (CRV)\",\"volume\":\"102 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-03-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 14th Conference on Computer and Robot Vision (CRV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CRV.2017.20\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 14th Conference on Computer and Robot Vision (CRV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CRV.2017.20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

在无约束的人脸检测中，大规模变化仍然是一个挑战。据我们所知，目前还没有一种人脸检测算法能够在检测一张800 × 800像素大的人脸的同时，以同样高的精度检测一张8 × 8像素小的人脸。我们提出了一个两阶段级联的人脸检测框架，多路径基于区域的卷积神经网络(MP-RCNN)，它将深度神经网络与经典学习策略无缝结合，以解决这一挑战。第一阶段是多路径区域建议网络(MP-RPN)，该网络在三个不同的尺度上提出人脸。它同时利用卷积特征映射的三个并行输出来预测多尺度候选人脸区域。“atrous”卷积技巧(上采样滤波器的卷积)和新提出的“硬”样本采样层被嵌入MP-RPN中，以进一步提高其性能。第二阶段是增强森林分类器，它利用候选人脸区域内部的深度面部特征以及候选人脸区域周围更大区域的深度上下文特征。这一步包括进一步去除硬阴性样品。实验表明，该方法在更宽的face数据集“硬”分区上实现了最先进的人脸检测性能，平均精度比以前的最佳结果高出9.6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-path Region-Based Convolutional Neural Network for Accurate Detection of Unconstrained "Hard Faces"

Large-scale variations still pose a challenge in unconstrained face detection. To the best of our knowledge, no current face detection algorithm can detect a face as large as 800 x 800 pixels while simultaneously detecting another one as small as 8 x 8 pixels within a single image with equally high accuracy. We propose a two-stage cascaded face detection framework, Multi-Path Region-based Convolutional Neural Network (MP-RCNN), that seamlessly combines a deep neural network with a classic learning strategy, to tackle this challenge. The first stage is a Multi-Path Region Proposal Network (MP-RPN) that proposes faces at three different scales. It simultaneously utilizes three parallel outputs of the convolutional feature maps to predict multi-scale candidate face regions. The "atrous" convolution trick (convolution with up-sampled filters) and a newly proposed sampling layer for "hard" examples are embedded in MP-RPN to further boost its performance. The second stage is a Boosted Forests classifier, which utilizes deep facial features pooled from inside the candidate face regions as well as deep contextual features pooled from a larger region surrounding the candidate face regions. This step is included to further remove hard negative samples. Experiments show that this approach achieves state-of-the-art face detection performance on the WIDER FACE dataset "hard" partition, outperforming the former best result by 9.6% for the Average Precision.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 14th Conference on Computer and Robot Vision (CRV)

自引率

0.00%

发文量