Synthesizing intelligible utterances from EEG of imagined speech.

IF 3.2 3区医学 Q2 NEUROSCIENCES

Frontiers in Neuroscience Pub Date : 2025-04-17 eCollection Date: 2025-01-01 DOI:10.3389/fnins.2025.1565848

Wenjing Xiong, Lin Ma, Haifeng Li

{"title":"Synthesizing intelligible utterances from EEG of imagined speech.","authors":"Wenjing Xiong, Lin Ma, Haifeng Li","doi":"10.3389/fnins.2025.1565848","DOIUrl":null,"url":null,"abstract":"<p><p>Decoding natural language directly from neural activity is of great interest to people with limited communication means. Being a non-invasive and convenient approach, the electroencephalogram (EEG) offers better portability and wider application potentiality. We present an EEG-to-speech system (ETS) that synthesizes audible, and highly understandable language by EEG of imagined speech. Our ETS applies a specially designed X-shape deep neural network (DNN) to realize an End-to-End correspondence between imagined speech EEG and the speech. The system innovatively incorporates dynamic time warping into the network's training process, using actual speech EEG data as a bridge to temporally align imagined speech EEG signals with speech signals. The ETS performance was evaluated on 13 participants who pretraining four Chinese disyllabic words. These words cover all four tones and 40% of the phonemes in Chinese. Our synthesized utterances' average accuracy across all subjects was 91.23%, the average MOS value was 3.50, and the best accuracy was 99% with an MOS value of 3.99. Furthermore, a partial feedback mechanism for DNN and spectral subtraction-based speech enhancement are introduced to improve the quality of generated speech. Our findings prove that non-invasive approaches can be a significant step in regaining verbal language ability.</p>","PeriodicalId":12639,"journal":{"name":"Frontiers in Neuroscience","volume":"19 ","pages":"1565848"},"PeriodicalIF":3.2000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12043648/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Neuroscience","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fnins.2025.1565848","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"NEUROSCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Decoding natural language directly from neural activity is of great interest to people with limited communication means. Being a non-invasive and convenient approach, the electroencephalogram (EEG) offers better portability and wider application potentiality. We present an EEG-to-speech system (ETS) that synthesizes audible, and highly understandable language by EEG of imagined speech. Our ETS applies a specially designed X-shape deep neural network (DNN) to realize an End-to-End correspondence between imagined speech EEG and the speech. The system innovatively incorporates dynamic time warping into the network's training process, using actual speech EEG data as a bridge to temporally align imagined speech EEG signals with speech signals. The ETS performance was evaluated on 13 participants who pretraining four Chinese disyllabic words. These words cover all four tones and 40% of the phonemes in Chinese. Our synthesized utterances' average accuracy across all subjects was 91.23%, the average MOS value was 3.50, and the best accuracy was 99% with an MOS value of 3.99. Furthermore, a partial feedback mechanism for DNN and spectral subtraction-based speech enhancement are introduced to improve the quality of generated speech. Our findings prove that non-invasive approaches can be a significant step in regaining verbal language ability.

查看原文本刊更多论文

从想象语音的脑电图中合成可理解的话语。

直接从神经活动中解码自然语言对于那些通讯手段有限的人来说是非常有趣的。脑电图作为一种无创、便捷的方法，具有更好的便携性和更广泛的应用潜力。提出了一种脑电转语音系统（ETS），通过脑电想象语音合成可听、可理解的语言。我们的ETS采用了一个特殊设计的x形深度神经网络（DNN）来实现想象语音EEG和语音之间的端到端对应。该系统创新性地将动态时间规整融入到网络的训练过程中，以实际语音脑电数据为桥梁，将想象的语音脑电信号与语音信号在时间上对齐。ETS测试了13名预训练4个汉语双音节词的参与者。这些词涵盖了汉语中所有的四个声调和40%的音素。我们合成的话语在所有被试中的平均准确率为91.23%，平均MOS值为3.50，最佳准确率为99%，MOS值为3.99。此外，还引入了深度神经网络的部分反馈机制和基于谱减的语音增强，以提高生成语音的质量。我们的研究结果证明，非侵入性方法是恢复语言能力的重要一步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Frontiers in Neuroscience NEUROSCIENCES-

CiteScore

6.20

自引率

4.70%

发文量

2070

审稿时长

14 weeks

期刊介绍： Neural Technology is devoted to the convergence between neurobiology and quantum-, nano- and micro-sciences. In our vision, this interdisciplinary approach should go beyond the technological development of sophisticated methods and should contribute in generating a genuine change in our discipline.