Miguel Angrick, Shiyu Luo, Qinwan Rabbani, Shreya Joshi, Daniel N Candrea, Griffin W Milsap, Chad R Gordon, Kathryn Rosenblatt, Lora Clawson, Nicholas Maragakis, Francesco V Tenore, Matthew S Fifer, Nick F Ramsey, Nathan E Crone
{"title":"从未标记的ECoG信号中实时检测语音:一项针对ALS参与者的试点研究。","authors":"Miguel Angrick, Shiyu Luo, Qinwan Rabbani, Shreya Joshi, Daniel N Candrea, Griffin W Milsap, Chad R Gordon, Kathryn Rosenblatt, Lora Clawson, Nicholas Maragakis, Francesco V Tenore, Matthew S Fifer, Nick F Ramsey, Nathan E Crone","doi":"10.1088/1741-2552/ae0965","DOIUrl":null,"url":null,"abstract":"<p><p><i>Objective</i>. Brain-computer interfaces hold significant promise for restoring communication in individuals with partial or complete loss of the ability to speak due to paralysis from amyotrophic lateral sclerosis (ALS), brainstem stroke, and other neurological disorders. Many of the approaches to speech decoding reported in the BCI literature have required time-aligned target representations to allow successful training-a major challenge when translating such approaches to people who have already lost their voice.<i>Approach</i>. In this pilot study, we made a first step toward scenarios in which no ground truth is available. We utilized a graph-based clustering approach to identify temporal segments of speech production from electrocorticographic (ECoG) signals alone. We then used the estimated speech segments to train a voice activity detection (VAD) model using only ECoG signals. We evaluated our approach using a leave-one-day-out cross-validation on open-loop recordings of a single dysarthric clinical trial participant living with ALS, and we compared the resulting performance to previous solutions trained with ground truth acoustic voice recordings.<i>Main results</i>. Our approach achieves a median timing error of around 530 ms with respect to the actual spoken speech. Embedded into a real-time BCI, our approach is capable of providing VAD results with a latency of only 10 ms.<i>Significance</i>. To the best of our knowledge, our results show for the first time that speech activity can be predicted purely from unlabeled ECoG signals, a crucial step toward individuals who cannot provide this information anymore due to their neurological condition, such as patients with locked-in syndrome.<i>Clinical Trial Information</i>. ClinicalTrials.gov, registration number NCT03567213.</p>","PeriodicalId":94096,"journal":{"name":"Journal of neural engineering","volume":" ","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12498269/pdf/","citationCount":"0","resultStr":"{\"title\":\"Real-time detection of spoken speech from unlabeled ECoG signals: a pilot study with an ALS participant.\",\"authors\":\"Miguel Angrick, Shiyu Luo, Qinwan Rabbani, Shreya Joshi, Daniel N Candrea, Griffin W Milsap, Chad R Gordon, Kathryn Rosenblatt, Lora Clawson, Nicholas Maragakis, Francesco V Tenore, Matthew S Fifer, Nick F Ramsey, Nathan E Crone\",\"doi\":\"10.1088/1741-2552/ae0965\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><i>Objective</i>. Brain-computer interfaces hold significant promise for restoring communication in individuals with partial or complete loss of the ability to speak due to paralysis from amyotrophic lateral sclerosis (ALS), brainstem stroke, and other neurological disorders. Many of the approaches to speech decoding reported in the BCI literature have required time-aligned target representations to allow successful training-a major challenge when translating such approaches to people who have already lost their voice.<i>Approach</i>. In this pilot study, we made a first step toward scenarios in which no ground truth is available. We utilized a graph-based clustering approach to identify temporal segments of speech production from electrocorticographic (ECoG) signals alone. We then used the estimated speech segments to train a voice activity detection (VAD) model using only ECoG signals. We evaluated our approach using a leave-one-day-out cross-validation on open-loop recordings of a single dysarthric clinical trial participant living with ALS, and we compared the resulting performance to previous solutions trained with ground truth acoustic voice recordings.<i>Main results</i>. Our approach achieves a median timing error of around 530 ms with respect to the actual spoken speech. Embedded into a real-time BCI, our approach is capable of providing VAD results with a latency of only 10 ms.<i>Significance</i>. To the best of our knowledge, our results show for the first time that speech activity can be predicted purely from unlabeled ECoG signals, a crucial step toward individuals who cannot provide this information anymore due to their neurological condition, such as patients with locked-in syndrome.<i>Clinical Trial Information</i>. ClinicalTrials.gov, registration number NCT03567213.</p>\",\"PeriodicalId\":94096,\"journal\":{\"name\":\"Journal of neural engineering\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-10-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12498269/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of neural engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1088/1741-2552/ae0965\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of neural engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/1741-2552/ae0965","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Real-time detection of spoken speech from unlabeled ECoG signals: a pilot study with an ALS participant.
Objective. Brain-computer interfaces hold significant promise for restoring communication in individuals with partial or complete loss of the ability to speak due to paralysis from amyotrophic lateral sclerosis (ALS), brainstem stroke, and other neurological disorders. Many of the approaches to speech decoding reported in the BCI literature have required time-aligned target representations to allow successful training-a major challenge when translating such approaches to people who have already lost their voice.Approach. In this pilot study, we made a first step toward scenarios in which no ground truth is available. We utilized a graph-based clustering approach to identify temporal segments of speech production from electrocorticographic (ECoG) signals alone. We then used the estimated speech segments to train a voice activity detection (VAD) model using only ECoG signals. We evaluated our approach using a leave-one-day-out cross-validation on open-loop recordings of a single dysarthric clinical trial participant living with ALS, and we compared the resulting performance to previous solutions trained with ground truth acoustic voice recordings.Main results. Our approach achieves a median timing error of around 530 ms with respect to the actual spoken speech. Embedded into a real-time BCI, our approach is capable of providing VAD results with a latency of only 10 ms.Significance. To the best of our knowledge, our results show for the first time that speech activity can be predicted purely from unlabeled ECoG signals, a crucial step toward individuals who cannot provide this information anymore due to their neurological condition, such as patients with locked-in syndrome.Clinical Trial Information. ClinicalTrials.gov, registration number NCT03567213.