GBNF-VAE: A Pathological Voice Enhancement Model Based on Gold Section for Bottleneck Feature With Variational Autoencoder

IF 2.4 4区 医学 Q1 AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY
Ganjun Liu , Tao Zhang , Biyun Ding , Ying Lv , Xiaohui Hou , Haoyang Guo , Yaqin Wu , Dehui Fu
{"title":"GBNF-VAE: A Pathological Voice Enhancement Model Based on Gold Section for Bottleneck Feature With Variational Autoencoder","authors":"Ganjun Liu ,&nbsp;Tao Zhang ,&nbsp;Biyun Ding ,&nbsp;Ying Lv ,&nbsp;Xiaohui Hou ,&nbsp;Haoyang Guo ,&nbsp;Yaqin Wu ,&nbsp;Dehui Fu","doi":"10.1016/j.jvoice.2023.03.012","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>Speech enhancement has become a promising technique to accommodate demands of the improvement in quality of a degraded speech signal. The main works now focus on separating normal speech from noise, but have neglected the low quality of impaired speech influenced by anomalous glottis flow. In order to effectively enhance the pathological speech, it is essential to design a separation mechanism for extracting high-dimensional timbre features and speech features separately to suppress low-dimensional noises.</div></div><div><h3>Methods</h3><div>In this paper, we propose an enhancement model GBNF-VAE to extract timbre efficiently by reducing anomalous airflow noise interference, and by combining the semantic features with timbre features to synthesize the enhanced speech. In particular, the bottleneck feature can characterize the timbre by the controlled number of nodes through the Golden Section method, which effectively improves computational efficiency. In addition, variational autoencoder is adopted to extract semantic features which are combined with the previous timbre features to synthesize the enhanced speech.</div></div><div><h3>Results</h3><div>Finally, spectrum observation, objective indicators and subjective evaluation all show the outstanding performance of GBNF-VAE in pathological speech quality enhancement.</div></div>","PeriodicalId":49954,"journal":{"name":"Journal of Voice","volume":"39 5","pages":"Pages 1171-1182"},"PeriodicalIF":2.4000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Voice","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0892199723001054","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Objective

Speech enhancement has become a promising technique to accommodate demands of the improvement in quality of a degraded speech signal. The main works now focus on separating normal speech from noise, but have neglected the low quality of impaired speech influenced by anomalous glottis flow. In order to effectively enhance the pathological speech, it is essential to design a separation mechanism for extracting high-dimensional timbre features and speech features separately to suppress low-dimensional noises.

Methods

In this paper, we propose an enhancement model GBNF-VAE to extract timbre efficiently by reducing anomalous airflow noise interference, and by combining the semantic features with timbre features to synthesize the enhanced speech. In particular, the bottleneck feature can characterize the timbre by the controlled number of nodes through the Golden Section method, which effectively improves computational efficiency. In addition, variational autoencoder is adopted to extract semantic features which are combined with the previous timbre features to synthesize the enhanced speech.

Results

Finally, spectrum observation, objective indicators and subjective evaluation all show the outstanding performance of GBNF-VAE in pathological speech quality enhancement.
GBNF-VAE:基于瓶颈特征黄金分割的变分自编码器病理语音增强模型
目的语音增强已成为一种很有前途的技术,以适应对退化语音信号质量提高的需求。目前的主要工作集中在正常语音与噪声的分离上,而忽略了异常声门流对低质量受损语音的影响。为了有效增强病理语音,必须设计一种分离机制,分别提取高维音色特征和语音特征,以抑制低维噪声。方法本文提出了一种增强模型GBNF-VAE,通过降低气流异常噪声干扰,有效提取语音音色,并将语义特征与音色特征相结合,合成增强语音。特别是瓶颈特征可以通过黄金分割法控制节点数量来表征音色,有效地提高了计算效率。此外,采用变分自编码器提取语义特征,并结合之前的音色特征合成增强语音。结果最后,频谱观察、客观指标和主观评价均显示GBNF-VAE在病理语音质量增强方面表现突出。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Voice
Journal of Voice 医学-耳鼻喉科学
CiteScore
4.00
自引率
13.60%
发文量
395
审稿时长
59 days
期刊介绍: The Journal of Voice is widely regarded as the world''s premiere journal for voice medicine and research. This peer-reviewed publication is listed in Index Medicus and is indexed by the Institute for Scientific Information. The journal contains articles written by experts throughout the world on all topics in voice sciences, voice medicine and surgery, and speech-language pathologists'' management of voice-related problems. The journal includes clinical articles, clinical research, and laboratory research. Members of the Foundation receive the journal as a benefit of membership.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信