{"title":"Super-fast parallel eigenface implementation on GPU for face recognition","authors":"Urvesh Devani, V. Nikam, B. Meshram","doi":"10.1109/PDGC.2014.7030729","DOIUrl":null,"url":null,"abstract":"Eigenface is one of the most common appearance based approaches for face recognition. Eigenfaces are the principal components which represent the training faces. Using Principal Component Analysis, each face is represented by very few parameters called weight vectors or feature vectors. While this makes testing process easy, it also includes cumbersome process of generating eigenspace and projecting every training image onto it to extract weight vectors. This approach works well with small set of images. As number of images to train increases, time taken for generating eigenspace and weight vectors also increases rapidly and it will not be feasible to recognize face in big data or perform real time video analysis. In this paper, we propose a super-fast parallel solution which harnesses the power of GPU and utilizes benefits of the thousands of cores to compute accurate match in fraction of second. We have implemented Parallel Eigenface, the first complete super-fast Parallel Eigenface implementation for face recognition, using CUDA on NVIDIA K20 GPU. Focus of the research has been to gain maximum performance by implementing highly optimized kernels for complete approach and utilizing available fastest library functions. We have used dataset of different size for training and noted very high increase in speedup. We are able to achieve highest 460X speed up for weight vectors generation of 1000 training images. We also get 73X speedup for overall training process on the same dataset. Speedup tends to increase with respect to training data, proving the scalability of solution. Results prove that our parallel implementation is best fit for various video analytics applications and real time face recognition. It also shows strong promise for excessive use of GPUs in face recognition systems.","PeriodicalId":311953,"journal":{"name":"2014 International Conference on Parallel, Distributed and Grid Computing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Parallel, Distributed and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDGC.2014.7030729","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
Eigenface is one of the most common appearance based approaches for face recognition. Eigenfaces are the principal components which represent the training faces. Using Principal Component Analysis, each face is represented by very few parameters called weight vectors or feature vectors. While this makes testing process easy, it also includes cumbersome process of generating eigenspace and projecting every training image onto it to extract weight vectors. This approach works well with small set of images. As number of images to train increases, time taken for generating eigenspace and weight vectors also increases rapidly and it will not be feasible to recognize face in big data or perform real time video analysis. In this paper, we propose a super-fast parallel solution which harnesses the power of GPU and utilizes benefits of the thousands of cores to compute accurate match in fraction of second. We have implemented Parallel Eigenface, the first complete super-fast Parallel Eigenface implementation for face recognition, using CUDA on NVIDIA K20 GPU. Focus of the research has been to gain maximum performance by implementing highly optimized kernels for complete approach and utilizing available fastest library functions. We have used dataset of different size for training and noted very high increase in speedup. We are able to achieve highest 460X speed up for weight vectors generation of 1000 training images. We also get 73X speedup for overall training process on the same dataset. Speedup tends to increase with respect to training data, proving the scalability of solution. Results prove that our parallel implementation is best fit for various video analytics applications and real time face recognition. It also shows strong promise for excessive use of GPUs in face recognition systems.