基于生成对抗网络的表面肌电图普通话语音重构

Q3 Medicine
Fengji Li , Fei Shen , Ding Ma , Jie Zhou , Li Wang , Fan Fan , Tao Liu , Xiaohong Chen , Tomoki Toda , Haijun Niu
{"title":"基于生成对抗网络的表面肌电图普通话语音重构","authors":"Fengji Li ,&nbsp;Fei Shen ,&nbsp;Ding Ma ,&nbsp;Jie Zhou ,&nbsp;Li Wang ,&nbsp;Fan Fan ,&nbsp;Tao Liu ,&nbsp;Xiaohong Chen ,&nbsp;Tomoki Toda ,&nbsp;Haijun Niu","doi":"10.1016/j.medntd.2025.100359","DOIUrl":null,"url":null,"abstract":"<div><div>The loss of speech function due to conditions such as laryngectomy and vocal cord paralysis significantly impacts the quality of life for patients. Achieving effective communication for these patients is a goal pursued by researchers. This study primarily explores a method for reconstructing Mandarin speech based on voice-related neck and facial surface electromyography (sEMG). Neck and facial sEMG signals and speech waveform were synchronously collected during normal speech production. A speech reconstruction model for Mandarin speech, based on multi-scale feature extraction from EMG and a generative adversarial network (GAN), was developed. Both subjective and objective evaluations were conducted to assess the speech reconstruction performance of the model. The evaluation results indicate that the model effectively reconstructs speech from neck and facial sEMG signals. The reconstructed speech closely matches the original in terms of spectrogram and fundamental frequency, with mel-cepstrum distortion of 8.45 ​dB, log F0 RMSE of 0.40, F0 correlation coefficient of 0.71 and F0 voiced/unvoiced estimation accuracy of 0.80. The character error rate of the reconstructed speech is 0.32, while the tone error rate is 0.26. The subjective listening test results show that the naturalness of the reconstructed speech is acceptable, with a mean opinion score greater than 3. This study demonstrates the potential of deep learning techniques in effectively reconstructing Mandarin speech from sEMG.</div></div>","PeriodicalId":33783,"journal":{"name":"Medicine in Novel Technology and Devices","volume":"26 ","pages":"Article 100359"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Mandarin speech reconstruction from surface electromyography based on generative adversarial networks\",\"authors\":\"Fengji Li ,&nbsp;Fei Shen ,&nbsp;Ding Ma ,&nbsp;Jie Zhou ,&nbsp;Li Wang ,&nbsp;Fan Fan ,&nbsp;Tao Liu ,&nbsp;Xiaohong Chen ,&nbsp;Tomoki Toda ,&nbsp;Haijun Niu\",\"doi\":\"10.1016/j.medntd.2025.100359\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The loss of speech function due to conditions such as laryngectomy and vocal cord paralysis significantly impacts the quality of life for patients. Achieving effective communication for these patients is a goal pursued by researchers. This study primarily explores a method for reconstructing Mandarin speech based on voice-related neck and facial surface electromyography (sEMG). Neck and facial sEMG signals and speech waveform were synchronously collected during normal speech production. A speech reconstruction model for Mandarin speech, based on multi-scale feature extraction from EMG and a generative adversarial network (GAN), was developed. Both subjective and objective evaluations were conducted to assess the speech reconstruction performance of the model. The evaluation results indicate that the model effectively reconstructs speech from neck and facial sEMG signals. The reconstructed speech closely matches the original in terms of spectrogram and fundamental frequency, with mel-cepstrum distortion of 8.45 ​dB, log F0 RMSE of 0.40, F0 correlation coefficient of 0.71 and F0 voiced/unvoiced estimation accuracy of 0.80. The character error rate of the reconstructed speech is 0.32, while the tone error rate is 0.26. The subjective listening test results show that the naturalness of the reconstructed speech is acceptable, with a mean opinion score greater than 3. This study demonstrates the potential of deep learning techniques in effectively reconstructing Mandarin speech from sEMG.</div></div>\",\"PeriodicalId\":33783,\"journal\":{\"name\":\"Medicine in Novel Technology and Devices\",\"volume\":\"26 \",\"pages\":\"Article 100359\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-03-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medicine in Novel Technology and Devices\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590093525000104\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medicine in Novel Technology and Devices","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590093525000104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

摘要

由于喉切除术和声带麻痹等情况导致的语言功能丧失严重影响患者的生活质量。为这些患者实现有效的沟通是研究人员追求的目标。本研究主要探讨一种基于语音相关颈部和面部肌电图(sEMG)的普通话语音重构方法。在正常语音产生过程中同步采集颈部、面部肌电信号和语音波形。提出了一种基于肌电图多尺度特征提取和生成对抗网络(GAN)的汉语语音重构模型。通过主观和客观评价来评估模型的语音重建性能。评价结果表明,该模型能有效地从颈部和面部肌电信号中重建语音。重建语音在频谱图和基频方面与原始语音接近,mel-倒谱失真为8.45 dB, log F0 RMSE为0.40,F0相关系数为0.71,F0浊音/浊音估计精度为0.80。重构语音的字符错误率为0.32,音调错误率为0.26。主观听力测试结果表明,重构语音的自然度是可以接受的,平均意见得分大于3分。本研究证明了深度学习技术在从表面肌电信号中有效重建普通话语音方面的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Mandarin speech reconstruction from surface electromyography based on generative adversarial networks
The loss of speech function due to conditions such as laryngectomy and vocal cord paralysis significantly impacts the quality of life for patients. Achieving effective communication for these patients is a goal pursued by researchers. This study primarily explores a method for reconstructing Mandarin speech based on voice-related neck and facial surface electromyography (sEMG). Neck and facial sEMG signals and speech waveform were synchronously collected during normal speech production. A speech reconstruction model for Mandarin speech, based on multi-scale feature extraction from EMG and a generative adversarial network (GAN), was developed. Both subjective and objective evaluations were conducted to assess the speech reconstruction performance of the model. The evaluation results indicate that the model effectively reconstructs speech from neck and facial sEMG signals. The reconstructed speech closely matches the original in terms of spectrogram and fundamental frequency, with mel-cepstrum distortion of 8.45 ​dB, log F0 RMSE of 0.40, F0 correlation coefficient of 0.71 and F0 voiced/unvoiced estimation accuracy of 0.80. The character error rate of the reconstructed speech is 0.32, while the tone error rate is 0.26. The subjective listening test results show that the naturalness of the reconstructed speech is acceptable, with a mean opinion score greater than 3. This study demonstrates the potential of deep learning techniques in effectively reconstructing Mandarin speech from sEMG.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Medicine in Novel Technology and Devices
Medicine in Novel Technology and Devices Medicine-Medicine (miscellaneous)
CiteScore
3.00
自引率
0.00%
发文量
74
审稿时长
64 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信