{"title":"Localized region context and object feature fusion for people head detection","authors":"Yule Li, Y. Dou, Xinwang Liu, Teng Li","doi":"10.1109/ICIP.2016.7532426","DOIUrl":null,"url":null,"abstract":"People head detection in crowded scenes is challenging due to the large variability in clothing and appearance, small scales of people, and strong partial occlusions. Traditional bottom-up proposal methods and existing region proposal network approaches suffer from either poor recall or low precision. In this paper, we propose to improve both the recall and precision of head detection of region proposal models by integrating the local head information. In specific, we first use a region proposal network to predict the bounding boxes and corresponding scores of multiple instances in the region. A local head classifier network is then trained to score the bounding box generated from the region proposal model. After that, we propose an adaptive fusion method by optimally combining both the region and local scores to obtain the final score of each candidate bounding box. Furthermore, our fusion models can automatically learn the optimal hyper-parameters from data. Our algorithm achieves superior people head detection performance on the crowded scenes data set, which significantly outperforms several recent state-of-the-art baselines in the literature.","PeriodicalId":6521,"journal":{"name":"2016 IEEE International Conference on Image Processing (ICIP)","volume":"35 1","pages":"594-598"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Image Processing (ICIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIP.2016.7532426","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
People head detection in crowded scenes is challenging due to the large variability in clothing and appearance, small scales of people, and strong partial occlusions. Traditional bottom-up proposal methods and existing region proposal network approaches suffer from either poor recall or low precision. In this paper, we propose to improve both the recall and precision of head detection of region proposal models by integrating the local head information. In specific, we first use a region proposal network to predict the bounding boxes and corresponding scores of multiple instances in the region. A local head classifier network is then trained to score the bounding box generated from the region proposal model. After that, we propose an adaptive fusion method by optimally combining both the region and local scores to obtain the final score of each candidate bounding box. Furthermore, our fusion models can automatically learn the optimal hyper-parameters from data. Our algorithm achieves superior people head detection performance on the crowded scenes data set, which significantly outperforms several recent state-of-the-art baselines in the literature.