Dark Experience for Incremental Keyword Spotting

Tianyi Peng, Yang Xiao
{"title":"Dark Experience for Incremental Keyword Spotting","authors":"Tianyi Peng, Yang Xiao","doi":"arxiv-2409.08153","DOIUrl":null,"url":null,"abstract":"Spoken keyword spotting (KWS) is crucial for identifying keywords within\naudio inputs and is widely used in applications like Apple Siri and Google\nHome, particularly on edge devices. Current deep learning-based KWS systems,\nwhich are typically trained on a limited set of keywords, can suffer from\nperformance degradation when encountering new domains, a challenge often\naddressed through few-shot fine-tuning. However, this adaptation frequently\nleads to catastrophic forgetting, where the model's performance on original\ndata deteriorates. Progressive continual learning (CL) strategies have been\nproposed to overcome this, but they face limitations such as the need for\ntask-ID information and increased storage, making them less practical for\nlightweight devices. To address these challenges, we introduce Dark Experience\nfor Keyword Spotting (DE-KWS), a novel CL approach that leverages dark\nknowledge to distill past experiences throughout the training process. DE-KWS\ncombines rehearsal and distillation, using both ground truth labels and logits\nstored in a memory buffer to maintain model performance across tasks.\nEvaluations on the Google Speech Command dataset show that DE-KWS outperforms\nexisting CL baselines in average accuracy without increasing model size,\noffering an effective solution for resource-constrained edge devices. The\nscripts are available on GitHub for the future research.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Spoken keyword spotting (KWS) is crucial for identifying keywords within audio inputs and is widely used in applications like Apple Siri and Google Home, particularly on edge devices. Current deep learning-based KWS systems, which are typically trained on a limited set of keywords, can suffer from performance degradation when encountering new domains, a challenge often addressed through few-shot fine-tuning. However, this adaptation frequently leads to catastrophic forgetting, where the model's performance on original data deteriorates. Progressive continual learning (CL) strategies have been proposed to overcome this, but they face limitations such as the need for task-ID information and increased storage, making them less practical for lightweight devices. To address these challenges, we introduce Dark Experience for Keyword Spotting (DE-KWS), a novel CL approach that leverages dark knowledge to distill past experiences throughout the training process. DE-KWS combines rehearsal and distillation, using both ground truth labels and logits stored in a memory buffer to maintain model performance across tasks. Evaluations on the Google Speech Command dataset show that DE-KWS outperforms existing CL baselines in average accuracy without increasing model size, offering an effective solution for resource-constrained edge devices. The scripts are available on GitHub for the future research.
发现增量关键字的黑暗体验
口语关键词识别(KWS)对于利用音频输入识别关键词至关重要,被广泛应用于苹果 Siri 和 GoogleHome 等应用中,尤其是在边缘设备上。目前基于深度学习的 KWS 系统通常是在有限的关键词集上进行训练的,在遇到新领域时可能会出现性能下降的问题,而这一挑战通常是通过少量的微调来解决的。然而,这种调整经常会导致灾难性遗忘,即模型在原始数据上的性能下降。有人提出了渐进式持续学习(CL)策略来克服这一问题,但这些策略面临着一些限制,例如需要掩码识别信息和增加存储空间,因此对于轻型设备来说不太实用。为了应对这些挑战,我们引入了关键词定位的黑暗经验(DE-KWS),这是一种新颖的持续学习方法,在整个训练过程中利用黑暗知识提炼过去的经验。在谷歌语音命令数据集上进行的评估表明,DE-KWS 的平均准确率优于现有的 CL 基线,而且不会增加模型大小,为资源有限的边缘设备提供了有效的解决方案。这些脚本可在 GitHub 上下载,供未来研究使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信