Decoding Envelope and Frequency-Following EEG Responses to Continuous Speech Using Deep Neural Networks

IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of signal processing Pub Date : 2024-03-18 DOI:10.1109/OJSP.2024.3378593

Mike D. Thornton;Danilo P. Mandic;Tobias J. Reichenbach

{"title":"Decoding Envelope and Frequency-Following EEG Responses to Continuous Speech Using Deep Neural Networks","authors":"Mike D. Thornton;Danilo P. Mandic;Tobias J. Reichenbach","doi":"10.1109/OJSP.2024.3378593","DOIUrl":null,"url":null,"abstract":"The electroencephalogram (EEG) offers a non-invasive means by which a listener's auditory system may be monitored during continuous speech perception. Reliable auditory-EEG decoders could facilitate the objective diagnosis of hearing disorders, or find applications in cognitively-steered hearing aids. Previously, we developed decoders for the ICASSP Auditory EEG Signal Processing Grand Challenge (SPGC). These decoders placed first in the match-mismatch task: given a short temporal segment of EEG recordings, and two candidate speech segments, the task is to identify which of the two speech segments is temporally aligned, or matched, with the EEG segment. The decoders made use of cortical responses to the speech envelope, as well as speech-related frequency-following responses, to relate the EEG recordings to the speech stimuli. Here we comprehensively document the methods by which the decoders were developed. We extend our previous analysis by exploring the association between speaker characteristics (pitch and sex) and classification accuracy, and provide a full statistical analysis of the final performance of the decoders as evaluated on a heldout portion of the dataset. Finally, the generalisation capabilities of the decoders are characterised, by evaluating them using an entirely different dataset which contains EEG recorded under a variety of speech-listening conditions. The results show that the match-mismatch decoders achieve accurate and robust classification accuracies, and they can even serve as auditory attention decoders without additional training.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"700-716"},"PeriodicalIF":2.9000,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10474145","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of signal processing","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10474145/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

The electroencephalogram (EEG) offers a non-invasive means by which a listener's auditory system may be monitored during continuous speech perception. Reliable auditory-EEG decoders could facilitate the objective diagnosis of hearing disorders, or find applications in cognitively-steered hearing aids. Previously, we developed decoders for the ICASSP Auditory EEG Signal Processing Grand Challenge (SPGC). These decoders placed first in the match-mismatch task: given a short temporal segment of EEG recordings, and two candidate speech segments, the task is to identify which of the two speech segments is temporally aligned, or matched, with the EEG segment. The decoders made use of cortical responses to the speech envelope, as well as speech-related frequency-following responses, to relate the EEG recordings to the speech stimuli. Here we comprehensively document the methods by which the decoders were developed. We extend our previous analysis by exploring the association between speaker characteristics (pitch and sex) and classification accuracy, and provide a full statistical analysis of the final performance of the decoders as evaluated on a heldout portion of the dataset. Finally, the generalisation capabilities of the decoders are characterised, by evaluating them using an entirely different dataset which contains EEG recorded under a variety of speech-listening conditions. The results show that the match-mismatch decoders achieve accurate and robust classification accuracies, and they can even serve as auditory attention decoders without additional training.

查看原文本刊更多论文

利用深度神经网络解码连续语音的包络和频率跟随脑电图响应

脑电图（EEG）提供了一种非侵入性方法，可在连续言语感知过程中监测听者的听觉系统。可靠的听觉脑电图解码器有助于客观诊断听力障碍，或应用于认知导向助听器。此前，我们为 ICASSP 听觉脑电图信号处理大挑战（SPGC）开发了解码器。这些解码器在 "匹配-不匹配 "任务中名列第一：给定一个短时段的脑电图记录和两个候选语音片段，任务是识别两个语音片段中哪个与脑电图片段在时间上一致或匹配。解码器利用大脑皮层对语音包络的反应以及与语音相关的频率跟随反应，将脑电图记录与语音刺激联系起来。在此，我们将全面记录解码器的开发方法。通过探索说话者特征（音调和性别）与分类准确性之间的关联，我们对之前的分析进行了扩展，并对解码器的最终性能进行了全面的统计分析，并对数据集的一部分进行了评估。最后，通过使用一个完全不同的数据集对解码器的泛化能力进行评估，该数据集包含在各种语音收听条件下记录的脑电图。结果表明，匹配-错配解码器实现了准确而稳健的分类精度，甚至可以作为听觉注意力解码器使用，而无需额外的训练。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊