Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization

IF 4.1 2区 计算机科学 Q1 ACOUSTICS
Bei Liu;Haoyu Wang;Yanmin Qian
{"title":"Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization","authors":"Bei Liu;Haoyu Wang;Yanmin Qian","doi":"10.1109/TASLP.2024.3437237","DOIUrl":null,"url":null,"abstract":"Modern speaker verification (SV) systems typically demand expensive storage and computing resources, thereby hindering their deployment on mobile devices. In this paper, we explore adaptive neural network quantization for lightweight speaker verification. Firstly, we propose a novel adaptive uniform precision quantization method which enables the dynamic generation of quantization centroids customized for each network layer based on k-means clustering. By applying it to the pre-trained SV systems, we obtain a series of quantized variants with different bit widths. To enhance low-bit quantized models, a mixed precision quantization algorithm along with a multi-stage fine-tuning (MSFT) strategy is further introduced. This approach assigns varying bit widths to different network layers. When bit combinations are determined, MSFT progressively quantizes and fine-tunes the network in a specific order. Finally, we design two distinct binary quantization schemes to mitigate performance degradation of 1-bit quantized models: the static and adaptive quantizers. Experiments on VoxCeleb demonstrate that lossless 4-bit uniform precision quantization is achieved on both ResNets and DF-ResNets, yielding a promising compression ratio of \n<inline-formula><tex-math>$\\sim$</tex-math></inline-formula>\n8. Moreover, compared to uniform precision approach, mixed precision quantization not only obtains additional performance improvements with a similar model size but also offers the flexibility to generate bit combination for any desirable model size. In addition, our suggested 1-bit quantization schemes remarkably boost the performance of binarized models. Finally, a thorough comparison with existing lightweight SV systems reveals that our proposed models outperform all previous methods by a large margin across various model size ranges.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"3771-3784"},"PeriodicalIF":4.1000,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10623284/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Modern speaker verification (SV) systems typically demand expensive storage and computing resources, thereby hindering their deployment on mobile devices. In this paper, we explore adaptive neural network quantization for lightweight speaker verification. Firstly, we propose a novel adaptive uniform precision quantization method which enables the dynamic generation of quantization centroids customized for each network layer based on k-means clustering. By applying it to the pre-trained SV systems, we obtain a series of quantized variants with different bit widths. To enhance low-bit quantized models, a mixed precision quantization algorithm along with a multi-stage fine-tuning (MSFT) strategy is further introduced. This approach assigns varying bit widths to different network layers. When bit combinations are determined, MSFT progressively quantizes and fine-tunes the network in a specific order. Finally, we design two distinct binary quantization schemes to mitigate performance degradation of 1-bit quantized models: the static and adaptive quantizers. Experiments on VoxCeleb demonstrate that lossless 4-bit uniform precision quantization is achieved on both ResNets and DF-ResNets, yielding a promising compression ratio of $\sim$ 8. Moreover, compared to uniform precision approach, mixed precision quantization not only obtains additional performance improvements with a similar model size but also offers the flexibility to generate bit combination for any desirable model size. In addition, our suggested 1-bit quantization schemes remarkably boost the performance of binarized models. Finally, a thorough comparison with existing lightweight SV systems reveals that our proposed models outperform all previous methods by a large margin across various model size ranges.
通过自适应神经网络量化实现轻量级扬声器验证
现代说话人验证(SV)系统通常需要昂贵的存储和计算资源,因此阻碍了它们在移动设备上的部署。在本文中,我们探讨了用于轻量级说话人验证的自适应神经网络量化方法。首先,我们提出了一种新颖的自适应统一精度量化方法,该方法能够在 K 均值聚类的基础上动态生成为每个网络层定制的量化中心点。通过将其应用于预训练 SV 系统,我们获得了一系列不同位宽的量化变体。为了增强低位量化模型,我们进一步引入了混合精度量化算法和多级微调(MSFT)策略。这种方法为不同的网络层分配不同的位宽。当位组合确定后,MSFT 按特定顺序逐步量化和微调网络。最后,我们设计了两种不同的二进制量化方案,以减轻 1 位量化模型的性能下降:静态量化器和自适应量化器。在 VoxCeleb 上进行的实验表明,在 ResNets 和 DF-ResNets 上实现了无损的 4 位统一精度量化,压缩率达到了 8 美元。此外,与统一精度方法相比,混合精度量化不仅能在模型大小相似的情况下获得额外的性能改进,还能灵活地生成任何理想模型大小的位组合。此外,我们建议的 1 位量化方案显著提高了二值化模型的性能。最后,与现有的轻量级 SV 系统进行全面比较后发现,在各种模型大小范围内,我们提出的模型都远远优于之前的所有方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE/ACM Transactions on Audio, Speech, and Language Processing
IEEE/ACM Transactions on Audio, Speech, and Language Processing ACOUSTICS-ENGINEERING, ELECTRICAL & ELECTRONIC
CiteScore
11.30
自引率
11.10%
发文量
217
期刊介绍: The IEEE/ACM Transactions on Audio, Speech, and Language Processing covers audio, speech and language processing and the sciences that support them. In audio processing: transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. In speech processing: areas such as speech analysis, synthesis, coding, speech and speaker recognition, speech production and perception, and speech enhancement. In language processing: speech and text analysis, understanding, generation, dialog management, translation, summarization, question answering and document indexing and retrieval, as well as general language modeling.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信