SM6:用于精确和噪声鲁棒的基于注意力的NLP应用的16nm片上系统:第33届热芯片研讨会(2021年8月22-24日

2021 IEEE Hot Chips 33 Symposium (HCS) Pub Date : 2021-08-22 DOI:10.1109/HCS52781.2021.9567180

Thierry Tambe, En-Yu Yang, Glenn G. Ko, Yuji Chai, Coleman Hooper, M. Donato, P. Whatmough, Alexander M. Rush, D. Brooks, Gu-Yeon Wei

{"title":"SM6:用于精确和噪声鲁棒的基于注意力的NLP应用的16nm片上系统:第33届热芯片研讨会(2021年8月22-24日","authors":"Thierry Tambe, En-Yu Yang, Glenn G. Ko, Yuji Chai, Coleman Hooper, M. Donato, P. Whatmough, Alexander M. Rush, D. Brooks, Gu-Yeon Wei","doi":"10.1109/HCS52781.2021.9567180","DOIUrl":null,"url":null,"abstract":"In this work, we present SM6, an SoC architecture for real-time denoised speech and NLP pipelines, featuring (1) MSSE: an unsupervised probabilistic sound source separation accelerator, (2) FlexNLP: a programmable inference accelerator for attention-based seq2seq DNNs using adaptive floating-point datatypes for wide dynamic range computations, (3) a dual-core Arm Cortex A53 CPU cluster, which provides on-demand SIMD FFT processing, and operating system support. In adverse acoustic conditions, MSSE allows FlexNLP to store up to 6x smaller ASR models obviating the very inefficient strategy of scaling up the DNN model to achieve noise robustness. MSSE and FlexNLP produce efficiency ranges of 4.33-17.6 Gsamples/s/W and 2.6-7.8TFLOPs/W, respectively, with per-frame end-to-end latencies of 15-45ms.","PeriodicalId":246531,"journal":{"name":"2021 IEEE Hot Chips 33 Symposium (HCS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"SM6: A 16nm System-on-Chip for Accurate and Noise-Robust Attention-Based NLP Applications : The 33rd Hot Chips Symposium – August 22-24, 2021\",\"authors\":\"Thierry Tambe, En-Yu Yang, Glenn G. Ko, Yuji Chai, Coleman Hooper, M. Donato, P. Whatmough, Alexander M. Rush, D. Brooks, Gu-Yeon Wei\",\"doi\":\"10.1109/HCS52781.2021.9567180\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work, we present SM6, an SoC architecture for real-time denoised speech and NLP pipelines, featuring (1) MSSE: an unsupervised probabilistic sound source separation accelerator, (2) FlexNLP: a programmable inference accelerator for attention-based seq2seq DNNs using adaptive floating-point datatypes for wide dynamic range computations, (3) a dual-core Arm Cortex A53 CPU cluster, which provides on-demand SIMD FFT processing, and operating system support. In adverse acoustic conditions, MSSE allows FlexNLP to store up to 6x smaller ASR models obviating the very inefficient strategy of scaling up the DNN model to achieve noise robustness. MSSE and FlexNLP produce efficiency ranges of 4.33-17.6 Gsamples/s/W and 2.6-7.8TFLOPs/W, respectively, with per-frame end-to-end latencies of 15-45ms.\",\"PeriodicalId\":246531,\"journal\":{\"name\":\"2021 IEEE Hot Chips 33 Symposium (HCS)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE Hot Chips 33 Symposium (HCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HCS52781.2021.9567180\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Hot Chips 33 Symposium (HCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HCS52781.2021.9567180","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在这项工作中，我们提出了SM6，一种用于实时去噪语音和NLP管道的SoC架构，具有(1)MSSE:一种无监督的概率声源分离加速器，(2)FlexNLP:一种可编程推理加速器，用于基于注意力的seq2seq dnn，使用自适应浮点数据类型进行宽动态范围计算，(3)双核Arm Cortex A53 CPU集群，提供按需SIMD FFT处理和操作系统支持。在不利的声学条件下，MSSE允许FlexNLP存储多达6倍较小的ASR模型，避免了放大DNN模型以实现噪声鲁棒性的非常低效的策略。MSSE和FlexNLP的效率范围分别为4.33-17.6 Gsamples/s/W和2.6-7.8TFLOPs/W，每帧端到端延迟为15-45ms。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SM6: A 16nm System-on-Chip for Accurate and Noise-Robust Attention-Based NLP Applications : The 33rd Hot Chips Symposium – August 22-24, 2021

In this work, we present SM6, an SoC architecture for real-time denoised speech and NLP pipelines, featuring (1) MSSE: an unsupervised probabilistic sound source separation accelerator, (2) FlexNLP: a programmable inference accelerator for attention-based seq2seq DNNs using adaptive floating-point datatypes for wide dynamic range computations, (3) a dual-core Arm Cortex A53 CPU cluster, which provides on-demand SIMD FFT processing, and operating system support. In adverse acoustic conditions, MSSE allows FlexNLP to store up to 6x smaller ASR models obviating the very inefficient strategy of scaling up the DNN model to achieve noise robustness. MSSE and FlexNLP produce efficiency ranges of 4.33-17.6 Gsamples/s/W and 2.6-7.8TFLOPs/W, respectively, with per-frame end-to-end latencies of 15-45ms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE Hot Chips 33 Symposium (HCS)

自引率

0.00%

发文量