Transformer-based neural speech decoding from surface and depth electrode signals.

Junbo Chen, Xupeng Chen, Ran Wang, Chenqian Le, Amirhossein Khalilian-Gourtani, Erika Jensen, Patricia Dugan, Werner Doyle, Orrin Devinsky, Daniel Friedman, Adeen Flinker, Yao Wang
{"title":"Transformer-based neural speech decoding from surface and depth electrode signals.","authors":"Junbo Chen, Xupeng Chen, Ran Wang, Chenqian Le, Amirhossein Khalilian-Gourtani, Erika Jensen, Patricia Dugan, Werner Doyle, Orrin Devinsky, Daniel Friedman, Adeen Flinker, Yao Wang","doi":"10.1088/1741-2552/adab21","DOIUrl":null,"url":null,"abstract":"<p><p><i>Objective.</i>This study investigates speech decoding from neural signals captured by intracranial electrodes. Most prior works can only work with electrodes on a 2D grid (i.e. Electrocorticographic (ECoG) or ECoG array) and data from a single patient. We aim to design a deep-learning model architecture that can accommodate both surface ECoG and depth (stereotactic EEG or sEEG) electrodes. The architecture should allow training on data from multiple participants with large variability in electrode placements. The model should not have subject-specific layers and the trained model should perform well on participants unseen during training.<i>Approach.</i>We propose a novel transformer-based model architecture named SwinTW that can work with arbitrarily positioned electrodes by leveraging their 3D locations on the cortex rather than their positions on a 2D grid. We train subject-specific models using data from a single participant and multi-subject models exploiting data from multiple participants.<i>Main results.</i>The subject-specific models using only low-density 8 × 8 ECoG data achieved high decoding Pearson Correlation Coefficient with ground truth spectrogram (PCC = 0.817), over<i>N</i>= 43 participants, significantly outperforming our prior convolutional ResNet model and the 3D Swin transformer model. Incorporating additional strip, depth, and grid electrodes available in each participant (<i>N</i>= 39) led to further improvement (PCC = 0.838). For participants with only sEEG electrodes (<i>N</i>= 9), subject-specific models still enjoy comparable performance with an average PCC = 0.798. A single multi-subject model trained on ECoG data from 15 participants yielded comparable results (PCC = 0.837) as 15 models trained individually for these participants (PCC = 0.831). Furthermore, the multi-subject models achieved high performance on unseen participants, with an average PCC = 0.765 in leave-one-out cross-validation.<i>Significance.</i>The proposed SwinTW decoder enables future speech decoding approaches to utilize any electrode placement that is clinically optimal or feasible for a particular participant, including using only depth electrodes, which are more routinely implanted in chronic neurosurgical procedures. The success of the single multi-subject model when tested on participants within the training cohort demonstrates that the model architecture is capable of exploiting data from multiple participants with diverse electrode placements. The architecture's flexibility in training with both single-subject and multi-subject data, as well as grid and non-grid electrodes, ensures its broad applicability. Importantly, the generalizability of the multi-subject models in our study population suggests that a model trained using paired acoustic and neural data from multiple patients can potentially be applied to new patients with speech disability where acoustic-neural training data is not feasible.</p>","PeriodicalId":94096,"journal":{"name":"Journal of neural engineering","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11773629/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of neural engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/1741-2552/adab21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Objective.This study investigates speech decoding from neural signals captured by intracranial electrodes. Most prior works can only work with electrodes on a 2D grid (i.e. Electrocorticographic (ECoG) or ECoG array) and data from a single patient. We aim to design a deep-learning model architecture that can accommodate both surface ECoG and depth (stereotactic EEG or sEEG) electrodes. The architecture should allow training on data from multiple participants with large variability in electrode placements. The model should not have subject-specific layers and the trained model should perform well on participants unseen during training.Approach.We propose a novel transformer-based model architecture named SwinTW that can work with arbitrarily positioned electrodes by leveraging their 3D locations on the cortex rather than their positions on a 2D grid. We train subject-specific models using data from a single participant and multi-subject models exploiting data from multiple participants.Main results.The subject-specific models using only low-density 8 × 8 ECoG data achieved high decoding Pearson Correlation Coefficient with ground truth spectrogram (PCC = 0.817), overN= 43 participants, significantly outperforming our prior convolutional ResNet model and the 3D Swin transformer model. Incorporating additional strip, depth, and grid electrodes available in each participant (N= 39) led to further improvement (PCC = 0.838). For participants with only sEEG electrodes (N= 9), subject-specific models still enjoy comparable performance with an average PCC = 0.798. A single multi-subject model trained on ECoG data from 15 participants yielded comparable results (PCC = 0.837) as 15 models trained individually for these participants (PCC = 0.831). Furthermore, the multi-subject models achieved high performance on unseen participants, with an average PCC = 0.765 in leave-one-out cross-validation.Significance.The proposed SwinTW decoder enables future speech decoding approaches to utilize any electrode placement that is clinically optimal or feasible for a particular participant, including using only depth electrodes, which are more routinely implanted in chronic neurosurgical procedures. The success of the single multi-subject model when tested on participants within the training cohort demonstrates that the model architecture is capable of exploiting data from multiple participants with diverse electrode placements. The architecture's flexibility in training with both single-subject and multi-subject data, as well as grid and non-grid electrodes, ensures its broad applicability. Importantly, the generalizability of the multi-subject models in our study population suggests that a model trained using paired acoustic and neural data from multiple patients can potentially be applied to new patients with speech disability where acoustic-neural training data is not feasible.

基于变压器的表面和深度电极信号神经语音解码。
目的:研究脑内电极采集的语音信号解码方法。大多数先前的工作只能使用二维网格上的电极(即,皮质电图或ECoG阵列)和来自单个患者的数据。我们的目标是设计一个深度学习模型架构,可以容纳表面(ECoG)和深度(立体定向EEG或sEEG)电极。该架构应该允许对来自多个参与者的数据进行训练,这些参与者的电极放置位置具有很大的可变性。模型不应该有特定主题的层,训练后的模型应该在训练期间看不见的参与者上表现良好。方法:我们提出了一种新的基于变压器的模型架构,名为SwinTW,它可以通过利用电极在皮层上的3D位置而不是它们在2D网格上的位置来处理任意位置的电极。主要结果:仅使用低密度8x8 ECoG数据的受试者特定模型与地面真值谱图获得了高解码Pearson相关系数(PCC=0.817),超过N=43个参与者,显著优于我们之前的卷积ResNet模型和3D Swin变压器模型。在每个参与者(N=39)中加入额外的条形、深度和栅格电极可进一步改善(PCC=0.838)。对于只使用sEEG电极的参与者(N=9),受试者特定模型仍然具有相当的性能,平均PCC=0.798。对来自15名参与者的ECoG数据进行训练的单一多受试者模型与针对这些参与者单独训练的15个模型(PCC=0.831)的结果相当(PCC=0.837)。此外,多受试者模型在未见的参与者上取得了高性能,在留一交叉验证中平均PCC=0.765。意义:所提出的SwinTW解码器使未来的语音解码方法能够利用任何临床最佳或对特定参与者可行的电极放置,包括仅使用深度电极,这在慢性神经外科手术中更常见。当对训练队列中的参与者进行测试时,单个多受试者模型的成功表明,该模型架构能够利用来自具有不同电极放置的多个参与者的数据。该体系结构在单主题和多主题数据训练以及网格和非网格电极训练方面的灵活性确保了其广泛的适用性。重要的是,在我们的研究人群中,多受试者模型的普遍性表明,使用来自多个患者的成对声学和神经数据训练的模型可能适用于声学神经训练数据不可行的新语言障碍患者。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信