{"title":"Fast face recognition on GPU","authors":"Zhiquan Guo, Jungang Han, Junyan Chen","doi":"10.1109/ICSESS.2015.7339173","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a fast parallelized implementation of face recognition based on local binary pattern (LBP) using Open computing Language (OpenCL), which is a novel open standard for heterogeneous computing. The LBP as well as its modifications CLBP (Circle Local Binary Patterns) and ULB (Uniform Local Binary Patterns) have been developed on a CPU and GPU using OpenCL. This paper also addresses several optimizations and parallelization problems related to the algorithms, such as LBP features extraction and Chi-dist computing to maximize the resource exploitation available on GPU. The optimizations are realized based on OpenCL memory and execution model. The experimental results based on the implementation on AMD GPU processor show that the GPU parallel implementation is about 50 times faster than the counterpart on CPU.","PeriodicalId":335871,"journal":{"name":"2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS)","volume":"95 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSESS.2015.7339173","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
In this paper, we propose a fast parallelized implementation of face recognition based on local binary pattern (LBP) using Open computing Language (OpenCL), which is a novel open standard for heterogeneous computing. The LBP as well as its modifications CLBP (Circle Local Binary Patterns) and ULB (Uniform Local Binary Patterns) have been developed on a CPU and GPU using OpenCL. This paper also addresses several optimizations and parallelization problems related to the algorithms, such as LBP features extraction and Chi-dist computing to maximize the resource exploitation available on GPU. The optimizations are realized based on OpenCL memory and execution model. The experimental results based on the implementation on AMD GPU processor show that the GPU parallel implementation is about 50 times faster than the counterpart on CPU.