系统级设计和集成AR/VR原型硬件，该硬件采用7nm技术，用于编解码器虚拟化身

2022 IEEE Custom Integrated Circuits Conference (CICC) Pub Date : 2022-04-01 DOI:10.1109/CICC53496.2022.9772810

H. Sumbul, Tony F. Wu, Yuecheng Li, Syed Shakib Sarwar, W. Koven, Eli Murphy-Trotzky, Xingxing Cai, E. Ansari, D. Morris, Huichu Liu, Doyun Kim, E. Beigné

{"title":"系统级设计和集成AR/VR原型硬件，该硬件采用7nm技术，用于编解码器虚拟化身","authors":"H. Sumbul, Tony F. Wu, Yuecheng Li, Syed Shakib Sarwar, W. Koven, Eli Murphy-Trotzky, Xingxing Cai, E. Ansari, D. Morris, Huichu Liu, Doyun Kim, E. Beigné","doi":"10.1109/CICC53496.2022.9772810","DOIUrl":null,"url":null,"abstract":"Augmented Reality / Virtual Reality (AR/VR) devices aim to connect people in the Metaverse with photorealistic virtual avatars, referred to as “Codec Avatars”. Delivering a high visual performance for Codec Avatar workloads, however, is a challenging task for mobile SoCs as AR/VR devices have limited power and form factor constraints. On-device, local, near-sensor processing provides the best system-level energy-efficiency and enables strong security and privacy features in the long run. In this work, we present a custom-built, prototype small-scale mobile SoC that achieves energy-efficient performance for running eye gaze extraction of the Codec Avatar model. The test-chip, fabricated in 7nm technology node, features a Neural Network (NN) accelerator consisting of a 1024 Multiply-Accumulate (MAC) array, 2MB on-chip SRAM, and a 32bit RISC-V CPU. The featured test-chip is integrated on a prototype mobile VR headset to run the Codec Avatar application. This work aims to show the full stack design considerations of system-level integration, hardware-aware model customization, and circuit-level acceleration to meet the challenging mobile AR/VR SoC specifications for a Codec Avatar demonstration. By re-architecting the Convolutional NN (CNN) based eye gaze extraction model and tailoring it for the hardware, the entire model fits on the chip to mitigate system-level energy and latency cost of off-chip memory accesses. By efficiently accelerating the convolution operation at the circuit-level, the presented prototype SoC achieves 30 frames per second performance with low-power consumption at low form factors. With the full-stack design considerations presented in this work, the featured test-chip consumes 22.7mW power to run inference on the entire CNN model in 16.5ms from input to output for a single sensor image. As a result, the test-chip achieves 375 µJ/frame/eye energy-efficiency within a 2.56 mm2 silicon area.","PeriodicalId":415990,"journal":{"name":"2022 IEEE Custom Integrated Circuits Conference (CICC)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"System-Level Design and Integration of a Prototype AR/VR Hardware Featuring a Custom Low-Power DNN Accelerator Chip in 7nm Technology for Codec Avatars\",\"authors\":\"H. Sumbul, Tony F. Wu, Yuecheng Li, Syed Shakib Sarwar, W. Koven, Eli Murphy-Trotzky, Xingxing Cai, E. Ansari, D. Morris, Huichu Liu, Doyun Kim, E. Beigné\",\"doi\":\"10.1109/CICC53496.2022.9772810\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Augmented Reality / Virtual Reality (AR/VR) devices aim to connect people in the Metaverse with photorealistic virtual avatars, referred to as “Codec Avatars”. Delivering a high visual performance for Codec Avatar workloads, however, is a challenging task for mobile SoCs as AR/VR devices have limited power and form factor constraints. On-device, local, near-sensor processing provides the best system-level energy-efficiency and enables strong security and privacy features in the long run. In this work, we present a custom-built, prototype small-scale mobile SoC that achieves energy-efficient performance for running eye gaze extraction of the Codec Avatar model. The test-chip, fabricated in 7nm technology node, features a Neural Network (NN) accelerator consisting of a 1024 Multiply-Accumulate (MAC) array, 2MB on-chip SRAM, and a 32bit RISC-V CPU. The featured test-chip is integrated on a prototype mobile VR headset to run the Codec Avatar application. This work aims to show the full stack design considerations of system-level integration, hardware-aware model customization, and circuit-level acceleration to meet the challenging mobile AR/VR SoC specifications for a Codec Avatar demonstration. By re-architecting the Convolutional NN (CNN) based eye gaze extraction model and tailoring it for the hardware, the entire model fits on the chip to mitigate system-level energy and latency cost of off-chip memory accesses. By efficiently accelerating the convolution operation at the circuit-level, the presented prototype SoC achieves 30 frames per second performance with low-power consumption at low form factors. With the full-stack design considerations presented in this work, the featured test-chip consumes 22.7mW power to run inference on the entire CNN model in 16.5ms from input to output for a single sensor image. As a result, the test-chip achieves 375 µJ/frame/eye energy-efficiency within a 2.56 mm2 silicon area.\",\"PeriodicalId\":415990,\"journal\":{\"name\":\"2022 IEEE Custom Integrated Circuits Conference (CICC)\",\"volume\":\"78 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE Custom Integrated Circuits Conference (CICC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CICC53496.2022.9772810\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Custom Integrated Circuits Conference (CICC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICC53496.2022.9772810","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

增强现实/虚拟现实(AR/VR)设备旨在将虚拟世界中的人们与逼真的虚拟化身(称为“编解码器化身”)联系起来。然而，对于移动soc来说，为Codec Avatar工作负载提供高视觉性能是一项具有挑战性的任务，因为AR/VR设备具有有限的功率和外形因素限制。设备上、本地、近传感器处理提供了最佳的系统级能源效率，并在长期内实现了强大的安全和隐私功能。在这项工作中，我们提出了一个定制的原型小规模移动SoC，该SoC实现了运行Codec Avatar模型的眼睛凝视提取的节能性能。该测试芯片采用7nm工艺节点制造，具有由1024个MAC阵列、2MB片上SRAM和32位RISC-V CPU组成的神经网络(NN)加速器。该特色测试芯片集成在原型移动VR耳机上，以运行Codec Avatar应用程序。这项工作旨在展示系统级集成、硬件感知模型定制和电路级加速的全栈设计考虑，以满足编解码器Avatar演示中具有挑战性的移动AR/VR SoC规范。通过重新构建基于卷积神经网络(CNN)的眼睛注视提取模型并针对硬件进行裁剪，整个模型适合于芯片，以降低芯片外存储器访问的系统级能量和延迟成本。通过有效地加速电路级的卷积运算，所提出的原型SoC在低尺寸下实现了每秒30帧的低功耗性能。考虑到本工作中提出的全堆栈设计考虑，该特色测试芯片在16.5ms内从输入到输出对整个CNN模型运行推理，功耗为22.7mW。因此，测试芯片在2.56 mm2的硅区域内实现了375µJ/帧/眼的能量效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

System-Level Design and Integration of a Prototype AR/VR Hardware Featuring a Custom Low-Power DNN Accelerator Chip in 7nm Technology for Codec Avatars

Augmented Reality / Virtual Reality (AR/VR) devices aim to connect people in the Metaverse with photorealistic virtual avatars, referred to as “Codec Avatars”. Delivering a high visual performance for Codec Avatar workloads, however, is a challenging task for mobile SoCs as AR/VR devices have limited power and form factor constraints. On-device, local, near-sensor processing provides the best system-level energy-efficiency and enables strong security and privacy features in the long run. In this work, we present a custom-built, prototype small-scale mobile SoC that achieves energy-efficient performance for running eye gaze extraction of the Codec Avatar model. The test-chip, fabricated in 7nm technology node, features a Neural Network (NN) accelerator consisting of a 1024 Multiply-Accumulate (MAC) array, 2MB on-chip SRAM, and a 32bit RISC-V CPU. The featured test-chip is integrated on a prototype mobile VR headset to run the Codec Avatar application. This work aims to show the full stack design considerations of system-level integration, hardware-aware model customization, and circuit-level acceleration to meet the challenging mobile AR/VR SoC specifications for a Codec Avatar demonstration. By re-architecting the Convolutional NN (CNN) based eye gaze extraction model and tailoring it for the hardware, the entire model fits on the chip to mitigate system-level energy and latency cost of off-chip memory accesses. By efficiently accelerating the convolution operation at the circuit-level, the presented prototype SoC achieves 30 frames per second performance with low-power consumption at low form factors. With the full-stack design considerations presented in this work, the featured test-chip consumes 22.7mW power to run inference on the entire CNN model in 16.5ms from input to output for a single sensor image. As a result, the test-chip achieves 375 µJ/frame/eye energy-efficiency within a 2.56 mm2 silicon area.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE Custom Integrated Circuits Conference (CICC)

自引率

0.00%

发文量