{"title":"Dark Experience for Incremental Keyword Spotting","authors":"Tianyi Peng, Yang Xiao","doi":"arxiv-2409.08153","DOIUrl":null,"url":null,"abstract":"Spoken keyword spotting (KWS) is crucial for identifying keywords within\naudio inputs and is widely used in applications like Apple Siri and Google\nHome, particularly on edge devices. Current deep learning-based KWS systems,\nwhich are typically trained on a limited set of keywords, can suffer from\nperformance degradation when encountering new domains, a challenge often\naddressed through few-shot fine-tuning. However, this adaptation frequently\nleads to catastrophic forgetting, where the model's performance on original\ndata deteriorates. Progressive continual learning (CL) strategies have been\nproposed to overcome this, but they face limitations such as the need for\ntask-ID information and increased storage, making them less practical for\nlightweight devices. To address these challenges, we introduce Dark Experience\nfor Keyword Spotting (DE-KWS), a novel CL approach that leverages dark\nknowledge to distill past experiences throughout the training process. DE-KWS\ncombines rehearsal and distillation, using both ground truth labels and logits\nstored in a memory buffer to maintain model performance across tasks.\nEvaluations on the Google Speech Command dataset show that DE-KWS outperforms\nexisting CL baselines in average accuracy without increasing model size,\noffering an effective solution for resource-constrained edge devices. The\nscripts are available on GitHub for the future research.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Spoken keyword spotting (KWS) is crucial for identifying keywords within
audio inputs and is widely used in applications like Apple Siri and Google
Home, particularly on edge devices. Current deep learning-based KWS systems,
which are typically trained on a limited set of keywords, can suffer from
performance degradation when encountering new domains, a challenge often
addressed through few-shot fine-tuning. However, this adaptation frequently
leads to catastrophic forgetting, where the model's performance on original
data deteriorates. Progressive continual learning (CL) strategies have been
proposed to overcome this, but they face limitations such as the need for
task-ID information and increased storage, making them less practical for
lightweight devices. To address these challenges, we introduce Dark Experience
for Keyword Spotting (DE-KWS), a novel CL approach that leverages dark
knowledge to distill past experiences throughout the training process. DE-KWS
combines rehearsal and distillation, using both ground truth labels and logits
stored in a memory buffer to maintain model performance across tasks.
Evaluations on the Google Speech Command dataset show that DE-KWS outperforms
existing CL baselines in average accuracy without increasing model size,
offering an effective solution for resource-constrained edge devices. The
scripts are available on GitHub for the future research.