Synthesizing intelligible utterances from EEG of imagined speech.

IF 3.2 3区 医学 Q2 NEUROSCIENCES
Frontiers in Neuroscience Pub Date : 2025-04-17 eCollection Date: 2025-01-01 DOI:10.3389/fnins.2025.1565848
Wenjing Xiong, Lin Ma, Haifeng Li
{"title":"Synthesizing intelligible utterances from EEG of imagined speech.","authors":"Wenjing Xiong, Lin Ma, Haifeng Li","doi":"10.3389/fnins.2025.1565848","DOIUrl":null,"url":null,"abstract":"<p><p>Decoding natural language directly from neural activity is of great interest to people with limited communication means. Being a non-invasive and convenient approach, the electroencephalogram (EEG) offers better portability and wider application potentiality. We present an EEG-to-speech system (ETS) that synthesizes audible, and highly understandable language by EEG of imagined speech. Our ETS applies a specially designed X-shape deep neural network (DNN) to realize an End-to-End correspondence between imagined speech EEG and the speech. The system innovatively incorporates dynamic time warping into the network's training process, using actual speech EEG data as a bridge to temporally align imagined speech EEG signals with speech signals. The ETS performance was evaluated on 13 participants who pretraining four Chinese disyllabic words. These words cover all four tones and 40% of the phonemes in Chinese. Our synthesized utterances' average accuracy across all subjects was 91.23%, the average MOS value was 3.50, and the best accuracy was 99% with an MOS value of 3.99. Furthermore, a partial feedback mechanism for DNN and spectral subtraction-based speech enhancement are introduced to improve the quality of generated speech. Our findings prove that non-invasive approaches can be a significant step in regaining verbal language ability.</p>","PeriodicalId":12639,"journal":{"name":"Frontiers in Neuroscience","volume":"19 ","pages":"1565848"},"PeriodicalIF":3.2000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12043648/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Neuroscience","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fnins.2025.1565848","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"NEUROSCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Decoding natural language directly from neural activity is of great interest to people with limited communication means. Being a non-invasive and convenient approach, the electroencephalogram (EEG) offers better portability and wider application potentiality. We present an EEG-to-speech system (ETS) that synthesizes audible, and highly understandable language by EEG of imagined speech. Our ETS applies a specially designed X-shape deep neural network (DNN) to realize an End-to-End correspondence between imagined speech EEG and the speech. The system innovatively incorporates dynamic time warping into the network's training process, using actual speech EEG data as a bridge to temporally align imagined speech EEG signals with speech signals. The ETS performance was evaluated on 13 participants who pretraining four Chinese disyllabic words. These words cover all four tones and 40% of the phonemes in Chinese. Our synthesized utterances' average accuracy across all subjects was 91.23%, the average MOS value was 3.50, and the best accuracy was 99% with an MOS value of 3.99. Furthermore, a partial feedback mechanism for DNN and spectral subtraction-based speech enhancement are introduced to improve the quality of generated speech. Our findings prove that non-invasive approaches can be a significant step in regaining verbal language ability.

从想象语音的脑电图中合成可理解的话语。
直接从神经活动中解码自然语言对于那些通讯手段有限的人来说是非常有趣的。脑电图作为一种无创、便捷的方法,具有更好的便携性和更广泛的应用潜力。提出了一种脑电转语音系统(ETS),通过脑电想象语音合成可听、可理解的语言。我们的ETS采用了一个特殊设计的x形深度神经网络(DNN)来实现想象语音EEG和语音之间的端到端对应。该系统创新性地将动态时间规整融入到网络的训练过程中,以实际语音脑电数据为桥梁,将想象的语音脑电信号与语音信号在时间上对齐。ETS测试了13名预训练4个汉语双音节词的参与者。这些词涵盖了汉语中所有的四个声调和40%的音素。我们合成的话语在所有被试中的平均准确率为91.23%,平均MOS值为3.50,最佳准确率为99%,MOS值为3.99。此外,还引入了深度神经网络的部分反馈机制和基于谱减的语音增强,以提高生成语音的质量。我们的研究结果证明,非侵入性方法是恢复语言能力的重要一步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Frontiers in Neuroscience
Frontiers in Neuroscience NEUROSCIENCES-
CiteScore
6.20
自引率
4.70%
发文量
2070
审稿时长
14 weeks
期刊介绍: Neural Technology is devoted to the convergence between neurobiology and quantum-, nano- and micro-sciences. In our vision, this interdisciplinary approach should go beyond the technological development of sophisticated methods and should contribute in generating a genuine change in our discipline.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信