从未标记的ECoG信号中实时检测语音：一项针对ALS参与者的试点研究。

IF 3.8

Journal of neural engineering Pub Date : 2025-10-06 DOI:10.1088/1741-2552/ae0965

Miguel Angrick, Shiyu Luo, Qinwan Rabbani, Shreya Joshi, Daniel N Candrea, Griffin W Milsap, Chad R Gordon, Kathryn Rosenblatt, Lora Clawson, Nicholas Maragakis, Francesco V Tenore, Matthew S Fifer, Nick F Ramsey, Nathan E Crone

{"title":"从未标记的ECoG信号中实时检测语音：一项针对ALS参与者的试点研究。","authors":"Miguel Angrick, Shiyu Luo, Qinwan Rabbani, Shreya Joshi, Daniel N Candrea, Griffin W Milsap, Chad R Gordon, Kathryn Rosenblatt, Lora Clawson, Nicholas Maragakis, Francesco V Tenore, Matthew S Fifer, Nick F Ramsey, Nathan E Crone","doi":"10.1088/1741-2552/ae0965","DOIUrl":null,"url":null,"abstract":"Objective. Brain-computer interfaces hold significant promise for restoring communication in individuals with partial or complete loss of the ability to speak due to paralysis from amyotrophic lateral sclerosis (ALS), brainstem stroke, and other neurological disorders. Many of the approaches to speech decoding reported in the BCI literature have required time-aligned target representations to allow successful training-a major challenge when translating such approaches to people who have already lost their voice.Approach. In this pilot study, we made a first step toward scenarios in which no ground truth is available. We utilized a graph-based clustering approach to identify temporal segments of speech production from electrocorticographic (ECoG) signals alone. We then used the estimated speech segments to train a voice activity detection (VAD) model using only ECoG signals. We evaluated our approach using a leave-one-day-out cross-validation on open-loop recordings of a single dysarthric clinical trial participant living with ALS, and we compared the resulting performance to previous solutions trained with ground truth acoustic voice recordings.Main results. Our approach achieves a median timing error of around 530 ms with respect to the actual spoken speech. Embedded into a real-time BCI, our approach is capable of providing VAD results with a latency of only 10 ms.Significance. To the best of our knowledge, our results show for the first time that speech activity can be predicted purely from unlabeled ECoG signals, a crucial step toward individuals who cannot provide this information anymore due to their neurological condition, such as patients with locked-in syndrome.Clinical Trial Information. ClinicalTrials.gov, registration number NCT03567213.","PeriodicalId":94096,"journal":{"name":"Journal of neural engineering","volume":" ","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12498269/pdf/","citationCount":"0","resultStr":"{\"title\":\"Real-time detection of spoken speech from unlabeled ECoG signals: a pilot study with an ALS participant.\",\"authors\":\"Miguel Angrick, Shiyu Luo, Qinwan Rabbani, Shreya Joshi, Daniel N Candrea, Griffin W Milsap, Chad R Gordon, Kathryn Rosenblatt, Lora Clawson, Nicholas Maragakis, Francesco V Tenore, Matthew S Fifer, Nick F Ramsey, Nathan E Crone\",\"doi\":\"10.1088/1741-2552/ae0965\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective. Brain-computer interfaces hold significant promise for restoring communication in individuals with partial or complete loss of the ability to speak due to paralysis from amyotrophic lateral sclerosis (ALS), brainstem stroke, and other neurological disorders. Many of the approaches to speech decoding reported in the BCI literature have required time-aligned target representations to allow successful training-a major challenge when translating such approaches to people who have already lost their voice.Approach. In this pilot study, we made a first step toward scenarios in which no ground truth is available. We utilized a graph-based clustering approach to identify temporal segments of speech production from electrocorticographic (ECoG) signals alone. We then used the estimated speech segments to train a voice activity detection (VAD) model using only ECoG signals. We evaluated our approach using a leave-one-day-out cross-validation on open-loop recordings of a single dysarthric clinical trial participant living with ALS, and we compared the resulting performance to previous solutions trained with ground truth acoustic voice recordings.Main results. Our approach achieves a median timing error of around 530 ms with respect to the actual spoken speech. Embedded into a real-time BCI, our approach is capable of providing VAD results with a latency of only 10 ms.Significance. To the best of our knowledge, our results show for the first time that speech activity can be predicted purely from unlabeled ECoG signals, a crucial step toward individuals who cannot provide this information anymore due to their neurological condition, such as patients with locked-in syndrome.Clinical Trial Information. ClinicalTrials.gov, registration number NCT03567213.\",\"PeriodicalId\":94096,\"journal\":{\"name\":\"Journal of neural engineering\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-10-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12498269/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of neural engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1088/1741-2552/ae0965\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of neural engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/1741-2552/ae0965","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

目的：脑机接口（bci）对于因肌萎缩侧索硬化症（ALS）、脑干中风和其他神经系统疾病而部分或完全丧失说话能力的个体恢复沟通能力具有重要的前景。脑机接口文献中报道的许多语音解码方法都需要时间一致的目标表示来进行成功的训练——这是将这些方法翻译给已经失声的人的一个主要挑战。方法：在这个试点研究中，我们向没有事实依据的场景迈出了第一步。我们利用基于图的聚类方法来识别语音产生的时间片段，仅从脑皮层电图（ECoG）信号。然后，我们使用估计的语音片段来训练仅使用ECoG信号的语音活动检测（VAD）模型。我们对一名患有ALS的临床试验参与者的开环录音进行了为期一天的交叉验证，对我们的方法进行了评估，并将结果与之前使用真实声学录音训练的解决方案进行了比较。主要结果：相对于实际语音，我们的方法实现了大约530毫秒的中位计时误差。嵌入到实时脑机接口中，我们的方法能够提供延迟仅为10毫秒的VAD结果。意义：据我们所知，我们的结果首次表明，语音活动可以完全从未标记的ECoG信号中预测，这对于那些由于神经系统疾病而无法提供这种信息的个体来说是至关重要的一步，例如患有闭锁综合征的患者。临床试验信息：Clinicaltrials: gov，注册号NCT03567213。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Real-time detection of spoken speech from unlabeled ECoG signals: a pilot study with an ALS participant.

查看原文本刊更多论文

Real-time detection of spoken speech from unlabeled ECoG signals: a pilot study with an ALS participant.

Objective. Brain-computer interfaces hold significant promise for restoring communication in individuals with partial or complete loss of the ability to speak due to paralysis from amyotrophic lateral sclerosis (ALS), brainstem stroke, and other neurological disorders. Many of the approaches to speech decoding reported in the BCI literature have required time-aligned target representations to allow successful training-a major challenge when translating such approaches to people who have already lost their voice.Approach. In this pilot study, we made a first step toward scenarios in which no ground truth is available. We utilized a graph-based clustering approach to identify temporal segments of speech production from electrocorticographic (ECoG) signals alone. We then used the estimated speech segments to train a voice activity detection (VAD) model using only ECoG signals. We evaluated our approach using a leave-one-day-out cross-validation on open-loop recordings of a single dysarthric clinical trial participant living with ALS, and we compared the resulting performance to previous solutions trained with ground truth acoustic voice recordings.Main results. Our approach achieves a median timing error of around 530 ms with respect to the actual spoken speech. Embedded into a real-time BCI, our approach is capable of providing VAD results with a latency of only 10 ms.Significance. To the best of our knowledge, our results show for the first time that speech activity can be predicted purely from unlabeled ECoG signals, a crucial step toward individuals who cannot provide this information anymore due to their neurological condition, such as patients with locked-in syndrome.Clinical Trial Information. ClinicalTrials.gov, registration number NCT03567213.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of neural engineering

自引率

0.00%

发文量