利用噪声感知多目标深度学习实现硬件高效语音增强

IF 2.4 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Salinna Abdullah;Majid Zamani;Andreas Demosthenous
{"title":"利用噪声感知多目标深度学习实现硬件高效语音增强","authors":"Salinna Abdullah;Majid Zamani;Andreas Demosthenous","doi":"10.1109/OJCAS.2024.3389100","DOIUrl":null,"url":null,"abstract":"This paper describes a supervised speech enhancement (SE) method utilising a noise-aware four-layer deep neural network and training target switching. For optimal speech denoising, the SE system, trained with multiple-target joint learning, switches between mapping-based, masking-based, or complementary processing, depending on the level of noise contamination detected. Optimisation techniques, including ternary quantisation, structural pruning, efficient sparse matrix representation and cost-effective approximations for complex computations, were implemented to reduce area, memory, and power requirements. Up to 19.1x compression was obtained, and all weights could be stored on the on-chip memory. When processing NOISEX-92 noises, the system achieved an average short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) scores of 0.81 and 1.62, respectively, outperforming SE algorithms trained with only a single learning target. The proposed SE processor was implemented on a field programmable gate array (FPGA) for proof of concept. Mapping the design on a 65-nm CMOS process led to a chip core area of \n<inline-formula> <tex-math>$3.88~mm^{2}$ </tex-math></inline-formula>\n and a power consumption of 1.91 mW when operating at a 10 MHz clock frequency.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10500889","citationCount":"0","resultStr":"{\"title\":\"Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning\",\"authors\":\"Salinna Abdullah;Majid Zamani;Andreas Demosthenous\",\"doi\":\"10.1109/OJCAS.2024.3389100\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes a supervised speech enhancement (SE) method utilising a noise-aware four-layer deep neural network and training target switching. For optimal speech denoising, the SE system, trained with multiple-target joint learning, switches between mapping-based, masking-based, or complementary processing, depending on the level of noise contamination detected. Optimisation techniques, including ternary quantisation, structural pruning, efficient sparse matrix representation and cost-effective approximations for complex computations, were implemented to reduce area, memory, and power requirements. Up to 19.1x compression was obtained, and all weights could be stored on the on-chip memory. When processing NOISEX-92 noises, the system achieved an average short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) scores of 0.81 and 1.62, respectively, outperforming SE algorithms trained with only a single learning target. The proposed SE processor was implemented on a field programmable gate array (FPGA) for proof of concept. Mapping the design on a 65-nm CMOS process led to a chip core area of \\n<inline-formula> <tex-math>$3.88~mm^{2}$ </tex-math></inline-formula>\\n and a power consumption of 1.91 mW when operating at a 10 MHz clock frequency.\",\"PeriodicalId\":93442,\"journal\":{\"name\":\"IEEE open journal of circuits and systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-04-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10500889\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE open journal of circuits and systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10500889/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of circuits and systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10500889/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

本文介绍了一种利用噪声感知四层深度神经网络和训练目标切换的有监督语音增强(SE)方法。为了优化语音去噪,通过多目标联合学习训练的 SE 系统会根据检测到的噪声污染程度,在基于映射、基于掩蔽或互补处理之间进行切换。该系统采用了优化技术,包括三元量化、结构剪枝、高效稀疏矩阵表示以及用于复杂计算的经济有效的近似方法,以减少对面积、内存和功率的需求。压缩率高达 19.1 倍,所有权重均可存储在片上存储器中。在处理 NOISEX-92 噪音时,该系统的平均短时客观可懂度(STOI)和语音质量感知评估(PESQ)得分分别为 0.81 和 1.62,优于仅使用单一学习目标训练的 SE 算法。为验证概念,在现场可编程门阵列(FPGA)上实现了所提出的 SE 处理器。将设计映射到 65 纳米 CMOS 工艺上后,芯片内核面积为 3.88~mm^{2}$ ,在 10 MHz 时钟频率下工作时的功耗为 1.91 mW。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning
This paper describes a supervised speech enhancement (SE) method utilising a noise-aware four-layer deep neural network and training target switching. For optimal speech denoising, the SE system, trained with multiple-target joint learning, switches between mapping-based, masking-based, or complementary processing, depending on the level of noise contamination detected. Optimisation techniques, including ternary quantisation, structural pruning, efficient sparse matrix representation and cost-effective approximations for complex computations, were implemented to reduce area, memory, and power requirements. Up to 19.1x compression was obtained, and all weights could be stored on the on-chip memory. When processing NOISEX-92 noises, the system achieved an average short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) scores of 0.81 and 1.62, respectively, outperforming SE algorithms trained with only a single learning target. The proposed SE processor was implemented on a field programmable gate array (FPGA) for proof of concept. Mapping the design on a 65-nm CMOS process led to a chip core area of $3.88~mm^{2}$ and a power consumption of 1.91 mW when operating at a 10 MHz clock frequency.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
审稿时长
19 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信