Wakeword Detection under Distribution Shifts

International Conference on Text, Speech and Dialogue Pub Date : 2022-07-13 DOI:10.48550/arXiv.2207.06423

S. Parthasarathi, Lu Zeng, Christin Jose, Joe Wang

{"title":"Wakeword Detection under Distribution Shifts","authors":"S. Parthasarathi, Lu Zeng, Christin Jose, Joe Wang","doi":"10.48550/arXiv.2207.06423","DOIUrl":null,"url":null,"abstract":"We propose a novel approach for semi-supervised learning (SSL) designed to overcome distribution shifts between training and real-world data arising in the keyword spotting (KWS) task. Shifts from training data distribution are a key challenge for real-world KWS tasks: when a new model is deployed on device, the gating of the accepted data undergoes a shift in distribution, making the problem of timely updates via subsequent deployments hard. Despite the shift, we assume that the marginal distributions on labels do not change. We utilize a modified teacher/student training framework, where labeled training data is augmented with unlabeled data. Note that the teacher does not have access to the new distribution as well. To train effectively with a mix of human and teacher labeled data, we develop a teacher labeling strategy based on confidence heuristics to reduce entropy on the label distribution from the teacher model; the data is then sampled to match the marginal distribution on the labels. Large scale experimental results show that a convolutional neural network (CNN) trained on far-field audio, and evaluated on far-field audio drawn from a different distribution, obtains a 14.3% relative improvement in false discovery rate (FDR) at equal false reject rate (FRR), while yielding a 5% improvement in FDR under no distribution shift. Under a more severe distribution shift from far-field to near-field audio with a smaller fully connected network (FCN) our approach achieves a 52% relative improvement in FDR at equal FRR, while yielding a 20% relative improvement in FDR on the original distribution.","PeriodicalId":358274,"journal":{"name":"International Conference on Text, Speech and Dialogue","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Text, Speech and Dialogue","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2207.06423","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

We propose a novel approach for semi-supervised learning (SSL) designed to overcome distribution shifts between training and real-world data arising in the keyword spotting (KWS) task. Shifts from training data distribution are a key challenge for real-world KWS tasks: when a new model is deployed on device, the gating of the accepted data undergoes a shift in distribution, making the problem of timely updates via subsequent deployments hard. Despite the shift, we assume that the marginal distributions on labels do not change. We utilize a modified teacher/student training framework, where labeled training data is augmented with unlabeled data. Note that the teacher does not have access to the new distribution as well. To train effectively with a mix of human and teacher labeled data, we develop a teacher labeling strategy based on confidence heuristics to reduce entropy on the label distribution from the teacher model; the data is then sampled to match the marginal distribution on the labels. Large scale experimental results show that a convolutional neural network (CNN) trained on far-field audio, and evaluated on far-field audio drawn from a different distribution, obtains a 14.3% relative improvement in false discovery rate (FDR) at equal false reject rate (FRR), while yielding a 5% improvement in FDR under no distribution shift. Under a more severe distribution shift from far-field to near-field audio with a smaller fully connected network (FCN) our approach achieves a 52% relative improvement in FDR at equal FRR, while yielding a 20% relative improvement in FDR on the original distribution.

查看原文本刊更多论文

分布移位下的唤醒词检测

我们提出了一种新的半监督学习(SSL)方法，旨在克服关键字识别(KWS)任务中出现的训练数据和真实数据之间的分布变化。训练数据分布的变化是现实世界KWS任务的一个关键挑战:当在设备上部署新模型时，可接受数据的门接在分布上发生变化，使得通过后续部署及时更新的问题变得困难。尽管有这种变化，我们假设标签上的边际分布不变。我们使用修改后的教师/学生培训框架，其中标记的培训数据与未标记的数据相增强。请注意，老师也没有访问新分发的权限。为了有效地训练人类和教师标记数据的混合，我们开发了一种基于置信度启发式的教师标记策略，以减少来自教师模型的标签分布的熵;然后对数据进行采样以匹配标签上的边际分布。大规模实验结果表明，卷积神经网络(CNN)在远场音频上进行训练，并对来自不同分布的远场音频进行评估，在相同的错误拒绝率(FRR)下，错误发现率(FDR)相对提高14.3%，而在没有分布移动的情况下，FDR提高了5%。在从远场到近场音频的更严重的分布转移和更小的完全连接网络(FCN)下，我们的方法在相同的FRR下实现了52%的FDR相对改进，同时在原始分布上产生了20%的FDR相对改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Conference on Text, Speech and Dialogue

自引率

0.00%

发文量