GBNF-VAE: A Pathological Voice Enhancement Model Based on Gold Section for Bottleneck Feature With Variational Autoencoder

IF 2.4 4区医学 Q1 AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY

Journal of Voice Pub Date : 2025-09-01 DOI:10.1016/j.jvoice.2023.03.012

Ganjun Liu , Tao Zhang , Biyun Ding , Ying Lv , Xiaohui Hou , Haoyang Guo , Yaqin Wu , Dehui Fu

{"title":"GBNF-VAE: A Pathological Voice Enhancement Model Based on Gold Section for Bottleneck Feature With Variational Autoencoder","authors":"Ganjun Liu , Tao Zhang , Biyun Ding , Ying Lv , Xiaohui Hou , Haoyang Guo , Yaqin Wu , Dehui Fu","doi":"10.1016/j.jvoice.2023.03.012","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>Speech enhancement has become a promising technique to accommodate demands of the improvement in quality of a degraded speech signal. The main works now focus on separating normal speech from noise, but have neglected the low quality of impaired speech influenced by anomalous glottis flow. In order to effectively enhance the pathological speech, it is essential to design a separation mechanism for extracting high-dimensional timbre features and speech features separately to suppress low-dimensional noises.</div></div><div><h3>Methods</h3><div>In this paper, we propose an enhancement model GBNF-VAE to extract timbre efficiently by reducing anomalous airflow noise interference, and by combining the semantic features with timbre features to synthesize the enhanced speech. In particular, the bottleneck feature can characterize the timbre by the controlled number of nodes through the Golden Section method, which effectively improves computational efficiency. In addition, variational autoencoder is adopted to extract semantic features which are combined with the previous timbre features to synthesize the enhanced speech.</div></div><div><h3>Results</h3><div>Finally, spectrum observation, objective indicators and subjective evaluation all show the outstanding performance of GBNF-VAE in pathological speech quality enhancement.</div></div>","PeriodicalId":49954,"journal":{"name":"Journal of Voice","volume":"39 5","pages":"Pages 1171-1182"},"PeriodicalIF":2.4000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Voice","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0892199723001054","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objective

Speech enhancement has become a promising technique to accommodate demands of the improvement in quality of a degraded speech signal. The main works now focus on separating normal speech from noise, but have neglected the low quality of impaired speech influenced by anomalous glottis flow. In order to effectively enhance the pathological speech, it is essential to design a separation mechanism for extracting high-dimensional timbre features and speech features separately to suppress low-dimensional noises.

Methods

In this paper, we propose an enhancement model GBNF-VAE to extract timbre efficiently by reducing anomalous airflow noise interference, and by combining the semantic features with timbre features to synthesize the enhanced speech. In particular, the bottleneck feature can characterize the timbre by the controlled number of nodes through the Golden Section method, which effectively improves computational efficiency. In addition, variational autoencoder is adopted to extract semantic features which are combined with the previous timbre features to synthesize the enhanced speech.

Results

Finally, spectrum observation, objective indicators and subjective evaluation all show the outstanding performance of GBNF-VAE in pathological speech quality enhancement.

查看原文本刊更多论文

GBNF-VAE：基于瓶颈特征黄金分割的变分自编码器病理语音增强模型

目的语音增强已成为一种很有前途的技术，以适应对退化语音信号质量提高的需求。目前的主要工作集中在正常语音与噪声的分离上，而忽略了异常声门流对低质量受损语音的影响。为了有效增强病理语音，必须设计一种分离机制，分别提取高维音色特征和语音特征，以抑制低维噪声。方法本文提出了一种增强模型GBNF-VAE，通过降低气流异常噪声干扰，有效提取语音音色，并将语义特征与音色特征相结合，合成增强语音。特别是瓶颈特征可以通过黄金分割法控制节点数量来表征音色，有效地提高了计算效率。此外，采用变分自编码器提取语义特征，并结合之前的音色特征合成增强语音。结果最后，频谱观察、客观指标和主观评价均显示GBNF-VAE在病理语音质量增强方面表现突出。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Voice 医学-耳鼻喉科学

CiteScore

4.00

自引率

13.60%

发文量

395

审稿时长

59 days

期刊介绍： The Journal of Voice is widely regarded as the world''s premiere journal for voice medicine and research. This peer-reviewed publication is listed in Index Medicus and is indexed by the Institute for Scientific Information. The journal contains articles written by experts throughout the world on all topics in voice sciences, voice medicine and surgery, and speech-language pathologists'' management of voice-related problems. The journal includes clinical articles, clinical research, and laboratory research. Members of the Foundation receive the journal as a benefit of membership.