Parallelizing Non-Neural ML Algorithm for Edge-based Face Recognition on Parallel Ultra-Low Power (PULP) Cluster

2023 12th Mediterranean Conference on Embedded Computing (MECO) Pub Date : 2023-06-06 DOI:10.1109/MECO58584.2023.10154955

M. S. Nagar, Rahul Kumar, Pinalkumar Engineer

{"title":"Parallelizing Non-Neural ML Algorithm for Edge-based Face Recognition on Parallel Ultra-Low Power (PULP) Cluster","authors":"M. S. Nagar, Rahul Kumar, Pinalkumar Engineer","doi":"10.1109/MECO58584.2023.10154955","DOIUrl":null,"url":null,"abstract":"The multi-core parallel ultra-low power (PULP) cluster architecture allows the IoT edge node to shift toward near-sensor computing. In this paper, non-neural Eigenfaces-based face recognition (FR) is examined on an octa-core PULP cluster. It is possible to achieve high accuracy in the Eigenfaces-based algorithm without using a large data model. It is observed that the Eigenfaces-based face recognition algorithm achieved 93% accuracy on the PULP platform with a $4.55\\times$ lesser model size compared to the state-of-the-art SqueezeNet1.1-based FR algorithm on GAP8 platform. Parallelization of Eigenfaces-based face recognition is done to achieve maximum speed-up on multi-core, reducing recognition time. Furthermore, DMA-based communication between the fabric controller and multi-core cluster reduces the recognition time by $50\\times$ at the cost of a little degradation in speed-up on the multi-core. By adopting this technique, 165 faces per second are recognized with 93% accuracy on octa-core PULP cluster, which is $7.85\\times$ faster than a single core RISC-V with DMA. Compared to the ARM Cortex-M7 architecture, the multi-core PULP cluster reduces recognition time by 89.89%. These results make the multi-core PULP cluster an efficient choice for Eigenfaces-based face recognition on the edge.","PeriodicalId":187825,"journal":{"name":"2023 12th Mediterranean Conference on Embedded Computing (MECO)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 12th Mediterranean Conference on Embedded Computing (MECO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MECO58584.2023.10154955","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The multi-core parallel ultra-low power (PULP) cluster architecture allows the IoT edge node to shift toward near-sensor computing. In this paper, non-neural Eigenfaces-based face recognition (FR) is examined on an octa-core PULP cluster. It is possible to achieve high accuracy in the Eigenfaces-based algorithm without using a large data model. It is observed that the Eigenfaces-based face recognition algorithm achieved 93% accuracy on the PULP platform with a $4.55\times$ lesser model size compared to the state-of-the-art SqueezeNet1.1-based FR algorithm on GAP8 platform. Parallelization of Eigenfaces-based face recognition is done to achieve maximum speed-up on multi-core, reducing recognition time. Furthermore, DMA-based communication between the fabric controller and multi-core cluster reduces the recognition time by $50\times$ at the cost of a little degradation in speed-up on the multi-core. By adopting this technique, 165 faces per second are recognized with 93% accuracy on octa-core PULP cluster, which is $7.85\times$ faster than a single core RISC-V with DMA. Compared to the ARM Cortex-M7 architecture, the multi-core PULP cluster reduces recognition time by 89.89%. These results make the multi-core PULP cluster an efficient choice for Eigenfaces-based face recognition on the edge.

查看原文本刊更多论文

基于并行超低功耗(PULP)聚类的边缘人脸识别并行非神经机器学习算法

多核并行超低功耗(PULP)集群架构允许物联网边缘节点向近传感器计算转变。本文在八核PULP聚类上研究了基于非神经特征脸的人脸识别。在不使用大数据模型的情况下，基于特征面的算法可以达到较高的精度。研究发现，与GAP8平台上基于squeezenet1.1的人脸识别算法相比，基于特征脸的人脸识别算法在PULP平台上实现了93%的准确率，模型尺寸减小了4.55倍。对基于特征脸的人脸识别进行并行化处理，在多核上实现最大的加速，减少识别时间。此外，基于dma的结构控制器和多核集群之间的通信将识别时间缩短了50倍，但代价是多核的加速速度略有下降。通过采用该技术，在八核PULP集群上每秒识别165张人脸，准确率为93%，比带有DMA的单核RISC-V快7.85倍。与ARM Cortex-M7架构相比，多核PULP集群的识别时间缩短了89.89%。这些结果使得多核PULP聚类成为基于特征脸的边缘人脸识别的有效选择。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 12th Mediterranean Conference on Embedded Computing (MECO)

自引率

0.00%

发文量