基于深度张量压缩LSTM神经网络的移动设备快速视频面部表情识别

IF 3.5 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Peining Zhen, Hai-Bao Chen, Yuan Cheng, Zhigang Ji, Bin Liu, Hao Yu
{"title":"基于深度张量压缩LSTM神经网络的移动设备快速视频面部表情识别","authors":"Peining Zhen, Hai-Bao Chen, Yuan Cheng, Zhigang Ji, Bin Liu, Hao Yu","doi":"10.1145/3464941","DOIUrl":null,"url":null,"abstract":"Mobile devices usually suffer from limited computation and storage resources, which seriously hinders them from deep neural network applications. In this article, we introduce a deeply tensor-compressed long short-term memory (LSTM) neural network for fast video-based facial expression recognition on mobile devices. First, a spatio-temporal facial expression recognition LSTM model is built by extracting time-series feature maps from facial clips. The LSTM-based spatio-temporal model is further deeply compressed by means of quantization and tensorization for mobile device implementation. Based on datasets of Extended Cohn-Kanade (CK+), MMI, and Acted Facial Expression in Wild 7.0, experimental results show that the proposed method achieves 97.96%, 97.33%, and 55.60% classification accuracy and significantly compresses the size of network model up to 221× with reduced training time per epoch by 60%. Our work is further implemented on the RK3399Pro mobile device with a Neural Process Engine. The latency of the feature extractor and LSTM predictor can be reduced 30.20× and 6.62× , respectively, on board with the leveraged compression methods. Furthermore, the spatio-temporal model costs only 57.19 MB of DRAM and 5.67W of power when running on the board.","PeriodicalId":29764,"journal":{"name":"ACM Transactions on Internet of Things","volume":"13 1","pages":"1 - 26"},"PeriodicalIF":3.5000,"publicationDate":"2021-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Fast Video Facial Expression Recognition by a Deeply Tensor-Compressed LSTM Neural Network for Mobile Devices\",\"authors\":\"Peining Zhen, Hai-Bao Chen, Yuan Cheng, Zhigang Ji, Bin Liu, Hao Yu\",\"doi\":\"10.1145/3464941\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mobile devices usually suffer from limited computation and storage resources, which seriously hinders them from deep neural network applications. In this article, we introduce a deeply tensor-compressed long short-term memory (LSTM) neural network for fast video-based facial expression recognition on mobile devices. First, a spatio-temporal facial expression recognition LSTM model is built by extracting time-series feature maps from facial clips. The LSTM-based spatio-temporal model is further deeply compressed by means of quantization and tensorization for mobile device implementation. Based on datasets of Extended Cohn-Kanade (CK+), MMI, and Acted Facial Expression in Wild 7.0, experimental results show that the proposed method achieves 97.96%, 97.33%, and 55.60% classification accuracy and significantly compresses the size of network model up to 221× with reduced training time per epoch by 60%. Our work is further implemented on the RK3399Pro mobile device with a Neural Process Engine. The latency of the feature extractor and LSTM predictor can be reduced 30.20× and 6.62× , respectively, on board with the leveraged compression methods. Furthermore, the spatio-temporal model costs only 57.19 MB of DRAM and 5.67W of power when running on the board.\",\"PeriodicalId\":29764,\"journal\":{\"name\":\"ACM Transactions on Internet of Things\",\"volume\":\"13 1\",\"pages\":\"1 - 26\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2021-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Internet of Things\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3464941\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Internet of Things","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3464941","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 8

摘要

移动设备通常受限于有限的计算和存储资源,这严重阻碍了深度神经网络的应用。在本文中,我们介绍了一种深度张量压缩的长短期记忆(LSTM)神经网络,用于移动设备上基于视频的快速面部表情识别。首先,通过提取人脸片段的时间序列特征映射,建立时空面部表情识别LSTM模型;基于lstm的时空模型通过量化和张张化进一步深度压缩,以便移动设备实现。基于Wild 7.0的扩展科恩-卡纳德(CK+)、MMI和动作面部表情数据集,实验结果表明,该方法的分类准确率分别达到97.96%、97.33%和55.60%,网络模型的大小显著压缩到221x,每个历元的训练时间减少了60%。我们的工作在带有神经处理引擎的RK3399Pro移动设备上进一步实现。使用杠杆压缩方法,特征提取器和LSTM预测器的延迟可以分别减少30.20倍和6.62倍。此外,时空模型在板上运行时仅消耗57.19 MB的DRAM和5.67W的功耗。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Fast Video Facial Expression Recognition by a Deeply Tensor-Compressed LSTM Neural Network for Mobile Devices
Mobile devices usually suffer from limited computation and storage resources, which seriously hinders them from deep neural network applications. In this article, we introduce a deeply tensor-compressed long short-term memory (LSTM) neural network for fast video-based facial expression recognition on mobile devices. First, a spatio-temporal facial expression recognition LSTM model is built by extracting time-series feature maps from facial clips. The LSTM-based spatio-temporal model is further deeply compressed by means of quantization and tensorization for mobile device implementation. Based on datasets of Extended Cohn-Kanade (CK+), MMI, and Acted Facial Expression in Wild 7.0, experimental results show that the proposed method achieves 97.96%, 97.33%, and 55.60% classification accuracy and significantly compresses the size of network model up to 221× with reduced training time per epoch by 60%. Our work is further implemented on the RK3399Pro mobile device with a Neural Process Engine. The latency of the feature extractor and LSTM predictor can be reduced 30.20× and 6.62× , respectively, on board with the leveraged compression methods. Furthermore, the spatio-temporal model costs only 57.19 MB of DRAM and 5.67W of power when running on the board.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.20
自引率
3.70%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信