Text Generation of Speech Imagery Based on an Enhanced CTA-BiLSTM Model Utilizing EEG Signals

IF 10.9 2区计算机科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Consumer Electronics Pub Date : 2025-04-04 DOI:10.1109/TCE.2025.3557912

Hongguang Pan;Xin Chu;Rui Miao;Mei Wang;Yiran Wang;Zhuoyi Li

{"title":"Text Generation of Speech Imagery Based on an Enhanced CTA-BiLSTM Model Utilizing EEG Signals","authors":"Hongguang Pan;Xin Chu;Rui Miao;Mei Wang;Yiran Wang;Zhuoyi Li","doi":"10.1109/TCE.2025.3557912","DOIUrl":null,"url":null,"abstract":"Recent studies have demonstrated the potential application of speech imagery neural signals in brain–computer interface (BCI) technology. Text generation based on speech imagery offers a natural communication method for individuals with speech disabilities. However, the limitations in imagined content and the immaturity of text generation technology currently constitute an obstacle to its applications. Therefore, this study proposes an enhanced CTA-BiLSTM model for efficient text generation utilizing speech imagery electroencephalography (EEG) signals, significantly enhancing the accuracy and fluency of text generation. Firstly, distinct from the prevailing imagination of characters and words, this study has assembled a sentence-level EEG dataset from ten subjects to facilitate communication. Subsequently, addressing the temporal dynamics characteristics and sequence dependencies of sentence signals, we employ dynamic time warping (DTW) and hidden Markov models (HMM) for accurate temporal alignment and signal annotation to generate fine-grained sentence labels. Finally, the proposed CTA-BiLSTM model leverages channel-time attention mechanism to dynamically adjust weights across channels and time, emphasizing critical features. Concurrently, the bidirectional long short-term memory (BiLSTM) network captures and utilizes long-term dependencies in the EEG signals, thereby enhancing the accuracy of the model in decoding complex temporal patterns. The experimental results demonstrate that the average sentence decoding accuracy can reach 67.50% on the self-built dataset, realizing a better evaluation accuracy and validating its potential for application.","PeriodicalId":13208,"journal":{"name":"IEEE Transactions on Consumer Electronics","volume":"71 2","pages":"3442-3453"},"PeriodicalIF":10.9000,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Consumer Electronics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10949619/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Recent studies have demonstrated the potential application of speech imagery neural signals in brain–computer interface (BCI) technology. Text generation based on speech imagery offers a natural communication method for individuals with speech disabilities. However, the limitations in imagined content and the immaturity of text generation technology currently constitute an obstacle to its applications. Therefore, this study proposes an enhanced CTA-BiLSTM model for efficient text generation utilizing speech imagery electroencephalography (EEG) signals, significantly enhancing the accuracy and fluency of text generation. Firstly, distinct from the prevailing imagination of characters and words, this study has assembled a sentence-level EEG dataset from ten subjects to facilitate communication. Subsequently, addressing the temporal dynamics characteristics and sequence dependencies of sentence signals, we employ dynamic time warping (DTW) and hidden Markov models (HMM) for accurate temporal alignment and signal annotation to generate fine-grained sentence labels. Finally, the proposed CTA-BiLSTM model leverages channel-time attention mechanism to dynamically adjust weights across channels and time, emphasizing critical features. Concurrently, the bidirectional long short-term memory (BiLSTM) network captures and utilizes long-term dependencies in the EEG signals, thereby enhancing the accuracy of the model in decoding complex temporal patterns. The experimental results demonstrate that the average sentence decoding accuracy can reach 67.50% on the self-built dataset, realizing a better evaluation accuracy and validating its potential for application.

查看原文本刊更多论文

基于脑电信号增强CTA-BiLSTM模型的语音图像文本生成

近年来的研究已经证明了语音图像神经信号在脑机接口（BCI）技术中的潜在应用。基于语音意象的文本生成为语言障碍患者提供了一种自然的交流方式。然而，想象内容的局限性和文本生成技术的不成熟，目前对其应用构成了障碍。因此，本研究提出了一种基于语音图像脑电图（EEG）信号的增强型CTA-BiLSTM高效文本生成模型，显著提高了文本生成的准确性和流畅性。首先，不同于普遍的文字想象，本研究收集了10个被试的句子级脑电数据集，以方便交流。随后，针对句子信号的时间动态特征和序列依赖性，我们采用动态时间规整（DTW）和隐马尔可夫模型（HMM）进行精确的时间对齐和信号标注，生成细粒度的句子标签。最后，本文提出的CTA-BiLSTM模型利用信道-时间注意机制在信道和时间之间动态调整权重，强调关键特征。同时，双向长短期记忆（BiLSTM）网络捕获并利用脑电信号中的长期依赖关系，从而提高了模型在解码复杂时间模式时的准确性。实验结果表明，在自建数据集上，平均句子译码准确率可达67.50%，实现了较好的评价准确率，验证了其应用潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Consumer Electronics 工程技术-电信学

CiteScore

7.70

自引率

9.30%

发文量

审稿时长

3.3 months

期刊介绍： The main focus for the IEEE Transactions on Consumer Electronics is the engineering and research aspects of the theory, design, construction, manufacture or end use of mass market electronics, systems, software and services for consumers.