Super-fast parallel eigenface implementation on GPU for face recognition

2014 International Conference on Parallel, Distributed and Grid Computing Pub Date : 2014-12-01 DOI:10.1109/PDGC.2014.7030729

Urvesh Devani, V. Nikam, B. Meshram

{"title":"Super-fast parallel eigenface implementation on GPU for face recognition","authors":"Urvesh Devani, V. Nikam, B. Meshram","doi":"10.1109/PDGC.2014.7030729","DOIUrl":null,"url":null,"abstract":"Eigenface is one of the most common appearance based approaches for face recognition. Eigenfaces are the principal components which represent the training faces. Using Principal Component Analysis, each face is represented by very few parameters called weight vectors or feature vectors. While this makes testing process easy, it also includes cumbersome process of generating eigenspace and projecting every training image onto it to extract weight vectors. This approach works well with small set of images. As number of images to train increases, time taken for generating eigenspace and weight vectors also increases rapidly and it will not be feasible to recognize face in big data or perform real time video analysis. In this paper, we propose a super-fast parallel solution which harnesses the power of GPU and utilizes benefits of the thousands of cores to compute accurate match in fraction of second. We have implemented Parallel Eigenface, the first complete super-fast Parallel Eigenface implementation for face recognition, using CUDA on NVIDIA K20 GPU. Focus of the research has been to gain maximum performance by implementing highly optimized kernels for complete approach and utilizing available fastest library functions. We have used dataset of different size for training and noted very high increase in speedup. We are able to achieve highest 460X speed up for weight vectors generation of 1000 training images. We also get 73X speedup for overall training process on the same dataset. Speedup tends to increase with respect to training data, proving the scalability of solution. Results prove that our parallel implementation is best fit for various video analytics applications and real time face recognition. It also shows strong promise for excessive use of GPUs in face recognition systems.","PeriodicalId":311953,"journal":{"name":"2014 International Conference on Parallel, Distributed and Grid Computing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Parallel, Distributed and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDGC.2014.7030729","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

Eigenface is one of the most common appearance based approaches for face recognition. Eigenfaces are the principal components which represent the training faces. Using Principal Component Analysis, each face is represented by very few parameters called weight vectors or feature vectors. While this makes testing process easy, it also includes cumbersome process of generating eigenspace and projecting every training image onto it to extract weight vectors. This approach works well with small set of images. As number of images to train increases, time taken for generating eigenspace and weight vectors also increases rapidly and it will not be feasible to recognize face in big data or perform real time video analysis. In this paper, we propose a super-fast parallel solution which harnesses the power of GPU and utilizes benefits of the thousands of cores to compute accurate match in fraction of second. We have implemented Parallel Eigenface, the first complete super-fast Parallel Eigenface implementation for face recognition, using CUDA on NVIDIA K20 GPU. Focus of the research has been to gain maximum performance by implementing highly optimized kernels for complete approach and utilizing available fastest library functions. We have used dataset of different size for training and noted very high increase in speedup. We are able to achieve highest 460X speed up for weight vectors generation of 1000 training images. We also get 73X speedup for overall training process on the same dataset. Speedup tends to increase with respect to training data, proving the scalability of solution. Results prove that our parallel implementation is best fit for various video analytics applications and real time face recognition. It also shows strong promise for excessive use of GPUs in face recognition systems.

查看原文本刊更多论文

基于GPU的人脸识别超快速并行特征脸实现

特征脸是人脸识别中最常见的基于外观的方法之一。特征面是代表训练面的主成分。使用主成分分析，每个人脸由称为权重向量或特征向量的很少几个参数表示。虽然这使得测试过程变得简单，但它也包含了生成特征空间并将每个训练图像投影到特征空间以提取权重向量的繁琐过程。这种方法适用于小图像集。随着待训练图像数量的增加，生成特征空间和权向量所需的时间也在迅速增加，在大数据中进行人脸识别或实时视频分析将不可行。在本文中，我们提出了一种超高速并行解决方案，该方案利用GPU的强大功能和数千核的优势，在几分之一秒内计算出精确的匹配。我们在NVIDIA K20 GPU上使用CUDA实现了并行特征脸，这是第一个完整的超快速并行特征脸人脸识别实现。研究的重点是通过为完整的方法实现高度优化的内核和利用可用的最快的库函数来获得最大的性能。我们使用不同大小的数据集进行训练，并注意到加速的提高非常高。对于1000张训练图像的权重向量生成，我们能够达到最高460X的速度。在相同的数据集上，我们也获得了73X的整体训练过程加速。相对于训练数据，加速倾向于增加，证明了解决方案的可扩展性。结果表明，我们的并行实现最适合各种视频分析应用和实时人脸识别。它还显示了在人脸识别系统中过度使用gpu的强大前景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 International Conference on Parallel, Distributed and Grid Computing

自引率

0.00%

发文量