{"title":"On-device Customization of Tiny Deep Learning Models for Keyword Spotting with Few Examples","authors":"Manuele Rusci, T. Tuytelaars","doi":"10.1109/mm.2023.3311826","DOIUrl":null,"url":null,"abstract":"Designing a customized keyword spotting (KWS) deep neural network (DNN) for tiny sensors is a time-consuming process, demanding training a new model on a remote server with a dataset of collected keywords. This article investigates the effectiveness of a DNN-based KWS classifier that can be initialized on-device simply by recording a few examples of the target commands. At runtime, the classifier computes the distance between the DNN output and the prototypes of the recorded keywords. By experimenting with multiple tiny machine learning models on the Google Speech Command dataset, we report an accuracy of up to 80% using only 10 examples of utterances not seen during training. When deployed on a multicore microcontroller with a power envelope of 25 mW, the most accurate ResNet15 model takes 9.7 ms to process a 1-s speech frame, demonstrating the feasibility of on-device KWS customization for tiny devices without requiring any backpropagation-based transfer learning.","PeriodicalId":13100,"journal":{"name":"IEEE Micro","volume":"1 1","pages":"50-57"},"PeriodicalIF":2.8000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Micro","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/mm.2023.3311826","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 1
Abstract
Designing a customized keyword spotting (KWS) deep neural network (DNN) for tiny sensors is a time-consuming process, demanding training a new model on a remote server with a dataset of collected keywords. This article investigates the effectiveness of a DNN-based KWS classifier that can be initialized on-device simply by recording a few examples of the target commands. At runtime, the classifier computes the distance between the DNN output and the prototypes of the recorded keywords. By experimenting with multiple tiny machine learning models on the Google Speech Command dataset, we report an accuracy of up to 80% using only 10 examples of utterances not seen during training. When deployed on a multicore microcontroller with a power envelope of 25 mW, the most accurate ResNet15 model takes 9.7 ms to process a 1-s speech frame, demonstrating the feasibility of on-device KWS customization for tiny devices without requiring any backpropagation-based transfer learning.
期刊介绍:
IEEE Micro addresses users and designers of microprocessors and microprocessor systems, including managers, engineers, consultants, educators, and students involved with computers and peripherals, components and subassemblies, communications, instrumentation and control equipment, and guidance systems. Contributions should relate to the design, performance, or application of microprocessors and microcomputers. Tutorials, review papers, and discussions are also welcome. Sample topic areas include architecture, communications, data acquisition, control, hardware and software design/implementation, algorithms (including program listings), digital signal processing, microprocessor support hardware, operating systems, computer aided design, languages, application software, and development systems.