Multi-Task ConvMixer Networks with Triplet Attention for Low-Resource Keyword Spotting

IF 6.6 1区计算机科学 Q1 Multidisciplinary

Tsinghua Science and Technology Pub Date : 2024-09-24 DOI:10.26599/TST.2024.9010088

Alexander Rogath Kivaisi;Qingjie Zhao;Yuanbing Zou

{"title":"Multi-Task ConvMixer Networks with Triplet Attention for Low-Resource Keyword Spotting","authors":"Alexander Rogath Kivaisi;Qingjie Zhao;Yuanbing Zou","doi":"10.26599/TST.2024.9010088","DOIUrl":null,"url":null,"abstract":"Customized keyword spotting needs to adapt quickly to small user samples. Current methods primarily solve the problem under moderate noise conditions. Recent work increases the level of difficulty in detecting keywords by introducing keyword interference. However, the current solution has been explored on large models with many parameters, making it unsuitable for deployment on small devices. When applying the current solution to lightweight models with minimal training data, the performance degrades compared to the baseline model. Therefore, we propose a light-weight multi-task architecture (<9.0×10>4</sup>\nparameters) created from integrating the triplet attention module in the ConvMixer networks and a new auxiliary mixed labeling encoding to address the challenge. The results of our experiment show that the proposed model outperforms similar light-weight models for keyword spotting, with accuracy gains ranging from 0.73% to 2.95% for a clean set and from 2.01% to 3.37% for a mixed set under different scales of training set. Furthermore, our model shows its robustness in different low-resource language datasets while converging faster.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 2","pages":"875-893"},"PeriodicalIF":6.6000,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10691379","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Tsinghua Science and Technology","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10691379/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Multidisciplinary","Score":null,"Total":0}

引用次数: 0

Abstract

Customized keyword spotting needs to adapt quickly to small user samples. Current methods primarily solve the problem under moderate noise conditions. Recent work increases the level of difficulty in detecting keywords by introducing keyword interference. However, the current solution has been explored on large models with many parameters, making it unsuitable for deployment on small devices. When applying the current solution to lightweight models with minimal training data, the performance degrades compared to the baseline model. Therefore, we propose a light-weight multi-task architecture (<9.0×10>4 parameters) created from integrating the triplet attention module in the ConvMixer networks and a new auxiliary mixed labeling encoding to address the challenge. The results of our experiment show that the proposed model outperforms similar light-weight models for keyword spotting, with accuracy gains ranging from 0.73% to 2.95% for a clean set and from 2.01% to 3.37% for a mixed set under different scales of training set. Furthermore, our model shows its robustness in different low-resource language datasets while converging faster.

查看原文本刊更多论文

基于三重关注的低资源关键字识别多任务ConvMixer网络

定制关键字定位需要快速适应小用户样本。目前的方法主要解决中等噪声条件下的问题。最近的工作通过引入关键词干扰增加了检测关键词的难度。然而，目前的解决方案已经在具有许多参数的大型模型上进行了探索，这使得它不适合在小型设备上部署。当将当前解决方案应用于具有最小训练数据的轻量级模型时，与基线模型相比，性能会下降。因此，我们提出了一种轻量级的多任务架构（4个参数），该架构通过集成ConvMixer网络中的三重注意力模块和一种新的辅助混合标记编码来解决这一挑战。我们的实验结果表明，我们提出的模型在关键字识别方面优于类似的轻量级模型，在不同规模的训练集下，干净集的准确率提高了0.73%到2.95%，混合集的准确率提高了2.01%到3.37%。此外，我们的模型在不同的低资源语言数据集上显示出鲁棒性，同时收敛速度更快。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Tsinghua Science and Technology COMPUTER SCIENCE, INFORMATION SYSTEMSCOMPU-COMPUTER SCIENCE, SOFTWARE ENGINEERING

CiteScore

10.20

自引率

10.60%

发文量

2340

期刊介绍： Tsinghua Science and Technology (Tsinghua Sci Technol) started publication in 1996. It is an international academic journal sponsored by Tsinghua University and is published bimonthly. This journal aims at presenting the up-to-date scientific achievements in computer science, electronic engineering, and other IT fields. Contributions all over the world are welcome.