频谱过度提取?半实时机器人自我语音过滤后的语音增强方法

Yue Li, Koen V. Hindriks, Florian A. Kunneman
{"title":"频谱过度提取?半实时机器人自我语音过滤后的语音增强方法","authors":"Yue Li, Koen V. Hindriks, Florian A. Kunneman","doi":"arxiv-2409.06274","DOIUrl":null,"url":null,"abstract":"Spectral subtraction, widely used for its simplicity, has been employed to\naddress the Robot Ego Speech Filtering (RESF) problem for detecting speech\ncontents of human interruption from robot's single-channel microphone\nrecordings when it is speaking. However, this approach suffers from\noversubtraction in the fundamental frequency range (FFR), leading to degraded\nspeech content recognition. To address this, we propose a Two-Mask\nConformer-based Metric Generative Adversarial Network (CMGAN) to enhance the\ndetected speech and improve recognition results. Our model compensates for\noversubtracted FFR values with high-frequency information and long-term\nfeatures and then de-noises the new spectrogram. In addition, we introduce an\nincremental processing method that allows semi-real-time audio processing with\nstreaming input on a network trained on long fixed-length input. Evaluations of\ntwo datasets, including one with unseen noise, demonstrate significant\nimprovements in recognition accuracy and the effectiveness of the proposed\ntwo-mask approach and incremental processing, enhancing the robustness of the\nproposed RESF pipeline in real-world HRI scenarios.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":"61 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Spectral oversubtraction? An approach for speech enhancement after robot ego speech filtering in semi-real-time\",\"authors\":\"Yue Li, Koen V. Hindriks, Florian A. Kunneman\",\"doi\":\"arxiv-2409.06274\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Spectral subtraction, widely used for its simplicity, has been employed to\\naddress the Robot Ego Speech Filtering (RESF) problem for detecting speech\\ncontents of human interruption from robot's single-channel microphone\\nrecordings when it is speaking. However, this approach suffers from\\noversubtraction in the fundamental frequency range (FFR), leading to degraded\\nspeech content recognition. To address this, we propose a Two-Mask\\nConformer-based Metric Generative Adversarial Network (CMGAN) to enhance the\\ndetected speech and improve recognition results. Our model compensates for\\noversubtracted FFR values with high-frequency information and long-term\\nfeatures and then de-noises the new spectrogram. In addition, we introduce an\\nincremental processing method that allows semi-real-time audio processing with\\nstreaming input on a network trained on long fixed-length input. Evaluations of\\ntwo datasets, including one with unseen noise, demonstrate significant\\nimprovements in recognition accuracy and the effectiveness of the proposed\\ntwo-mask approach and incremental processing, enhancing the robustness of the\\nproposed RESF pipeline in real-world HRI scenarios.\",\"PeriodicalId\":501284,\"journal\":{\"name\":\"arXiv - EE - Audio and Speech Processing\",\"volume\":\"61 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - EE - Audio and Speech Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.06274\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.06274","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

频谱减法因其简便性而被广泛应用于解决机器人自我语音过滤(RESF)问题,用于从机器人说话时的单通道麦克风录音中检测人类干扰的语音内容。然而,这种方法在基频范围(FFR)内存在过度抽取问题,导致语音内容识别能力下降。针对这一问题,我们提出了基于双掩码变换器的度量生成对抗网络(CMGAN),以增强检测到的语音并改善识别结果。我们的模型利用高频信息和长期特征来补偿被减弱的 FFR 值,然后对新的频谱图进行去噪。此外,我们还引入了一种增量处理方法,该方法允许在根据长固定长度输入训练的网络上使用流输入进行半实时音频处理。对两个数据集(包括一个含有未知噪声的数据集)的评估结果表明,拟议的双掩码方法和增量处理方法的识别准确率和有效性都有显著提高,从而增强了拟议的 RESF 管道在现实世界 HRI 场景中的鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Spectral oversubtraction? An approach for speech enhancement after robot ego speech filtering in semi-real-time
Spectral subtraction, widely used for its simplicity, has been employed to address the Robot Ego Speech Filtering (RESF) problem for detecting speech contents of human interruption from robot's single-channel microphone recordings when it is speaking. However, this approach suffers from oversubtraction in the fundamental frequency range (FFR), leading to degraded speech content recognition. To address this, we propose a Two-Mask Conformer-based Metric Generative Adversarial Network (CMGAN) to enhance the detected speech and improve recognition results. Our model compensates for oversubtracted FFR values with high-frequency information and long-term features and then de-noises the new spectrogram. In addition, we introduce an incremental processing method that allows semi-real-time audio processing with streaming input on a network trained on long fixed-length input. Evaluations of two datasets, including one with unseen noise, demonstrate significant improvements in recognition accuracy and the effectiveness of the proposed two-mask approach and incremental processing, enhancing the robustness of the proposed RESF pipeline in real-world HRI scenarios.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信