Iterative alignment discovery of speech-associated neural activity.

Journal of neural engineering Pub Date : 2024-08-28 DOI:10.1088/1741-2552/ad663c

Qinwan Rabbani, Samyak Shah, Griffin Milsap, Matthew Fifer, Hynek Hermansky, Nathan Crone

{"title":"Iterative alignment discovery of speech-associated neural activity.","authors":"Qinwan Rabbani, Samyak Shah, Griffin Milsap, Matthew Fifer, Hynek Hermansky, Nathan Crone","doi":"10.1088/1741-2552/ad663c","DOIUrl":null,"url":null,"abstract":"Objective. Brain-computer interfaces (BCIs) have the potential to preserve or restore speech in patients with neurological disorders that weaken the muscles involved in speech production. However, successful training of low-latency speech synthesis and recognition models requires alignment of neural activity with intended phonetic or acoustic output with high temporal precision. This is particularly challenging in patients who cannot produce audible speech, as ground truth with which to pinpoint neural activity synchronized with speech is not available.Approach. In this study, we present a new iterative algorithm for neural voice activity detection (nVAD) called iterative alignment discovery dynamic time warping (IAD-DTW) that integrates DTW into the loss function of a deep neural network (DNN). The algorithm is designed to discover the alignment between a patient's electrocorticographic (ECoG) neural responses and their attempts to speak during collection of data for training BCI decoders for speech synthesis and recognition.Main results. To demonstrate the effectiveness of the algorithm, we tested its accuracy in predicting the onset and duration of acoustic signals produced by able-bodied patients with intact speech undergoing short-term diagnostic ECoG recordings for epilepsy surgery. We simulated a lack of ground truth by randomly perturbing the temporal correspondence between neural activity and an initial single estimate for all speech onsets and durations. We examined the model's ability to overcome these perturbations to estimate ground truth. IAD-DTW showed no notable degradation (<1% absolute decrease in accuracy) in performance in these simulations, even in the case of maximal misalignments between speech and silence.Significance. IAD-DTW is computationally inexpensive and can be easily integrated into existing DNN-based nVAD approaches, as it pertains only to the final loss computation. This approach makes it possible to train speech BCI algorithms using ECoG data from patients who are unable to produce audible speech, including those with Locked-In Syndrome.","PeriodicalId":94096,"journal":{"name":"Journal of neural engineering","volume":"21 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11351572/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of neural engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/1741-2552/ad663c","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Objective. Brain-computer interfaces (BCIs) have the potential to preserve or restore speech in patients with neurological disorders that weaken the muscles involved in speech production. However, successful training of low-latency speech synthesis and recognition models requires alignment of neural activity with intended phonetic or acoustic output with high temporal precision. This is particularly challenging in patients who cannot produce audible speech, as ground truth with which to pinpoint neural activity synchronized with speech is not available.Approach. In this study, we present a new iterative algorithm for neural voice activity detection (nVAD) called iterative alignment discovery dynamic time warping (IAD-DTW) that integrates DTW into the loss function of a deep neural network (DNN). The algorithm is designed to discover the alignment between a patient's electrocorticographic (ECoG) neural responses and their attempts to speak during collection of data for training BCI decoders for speech synthesis and recognition.Main results. To demonstrate the effectiveness of the algorithm, we tested its accuracy in predicting the onset and duration of acoustic signals produced by able-bodied patients with intact speech undergoing short-term diagnostic ECoG recordings for epilepsy surgery. We simulated a lack of ground truth by randomly perturbing the temporal correspondence between neural activity and an initial single estimate for all speech onsets and durations. We examined the model's ability to overcome these perturbations to estimate ground truth. IAD-DTW showed no notable degradation (<1% absolute decrease in accuracy) in performance in these simulations, even in the case of maximal misalignments between speech and silence.Significance. IAD-DTW is computationally inexpensive and can be easily integrated into existing DNN-based nVAD approaches, as it pertains only to the final loss computation. This approach makes it possible to train speech BCI algorithms using ECoG data from patients who are unable to produce audible speech, including those with Locked-In Syndrome.

查看原文本刊更多论文

语音相关神经活动的迭代排列发现。

目的。脑机接口（BCI）有可能保护或恢复因神经系统疾病而导致语言生成肌肉功能减弱的患者的语言能力。然而，要成功训练低延迟语音合成和识别模型，需要将神经活动与预期的语音或声学输出进行高时间精度的对齐。这对于无法发出可听语音的患者来说尤其具有挑战性，因为他们无法获得与语音同步的神经活动的基本事实。在这项研究中，我们提出了一种新的神经语音活动检测（nVAD）迭代算法，称为迭代对齐发现动态时间扭曲（IAD-DTW），它将 DTW 集成到深度神经网络（DNN）的损失函数中。该算法旨在发现患者的皮层电图（ECoG）神经反应与他们在收集数据期间试图说话之间的一致性，以训练用于语音合成和识别的 BCI 解码器。为了证明该算法的有效性，我们测试了该算法在预测因癫痫手术而接受短期诊断性心电图记录的具有完整语言能力的健全患者所发出的声音信号的起始和持续时间方面的准确性。我们通过随机扰动神经活动与所有语音起始和持续时间的初始单一估计值之间的时间对应关系，模拟了缺乏基本事实的情况。我们检验了模型克服这些扰动以估计基本事实的能力。结果显示，IAD-DTW 的性能没有明显下降（意义重大。IAD-DTW 计算成本低廉，可轻松集成到现有的基于 DNN 的 nVAD 方法中，因为它只涉及最终损失计算。这种方法使得使用无法发出可听语音的患者（包括锁定综合症患者）的心电图数据训练语音 BCI 算法成为可能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of neural engineering

自引率

0.00%

发文量