AADNet: An End-to-End Deep Learning Model for Auditory Attention Decoding.

IF 4.8 2区医学 Q2 ENGINEERING, BIOMEDICAL

IEEE Transactions on Neural Systems and Rehabilitation Engineering Pub Date : 2025-07-09 DOI:10.1109/TNSRE.2025.3587637

Nhan Duc Thanh Nguyen, Huy Phan, Simon Geirnaert, Kaare Mikkelsen, Preben Kidmose

{"title":"AADNet: An End-to-End Deep Learning Model for Auditory Attention Decoding.","authors":"Nhan Duc Thanh Nguyen, Huy Phan, Simon Geirnaert, Kaare Mikkelsen, Preben Kidmose","doi":"10.1109/TNSRE.2025.3587637","DOIUrl":null,"url":null,"abstract":"<p><p>Auditory attention decoding (AAD) is the process of identifying the attended speech in a multi-talker environment using brain signals, typically recorded through electroencephalography (EEG). Over the past decade, AAD has undergone continuous development, driven by its promising application in neuro-steered hearing devices. Most AAD algorithms are relying on the increase in neural entrainment to the envelope of attended speech, as compared to unattended speech, typically using a two-step approach. First, the algorithm predicts representations of the attended speech signal envelopes; second, it identifies the attended speech by finding the highest correlation between the predictions and the representations of the actual speech signals. In this study, we proposed a novel end-to-end neural network architecture, named AADNet, which combines these two stages into a direct approach to address the AAD problem. We compare the proposed network against traditional stimulus decoding-based approaches, including linear stimulus reconstruction, canonical correlation analysis, and an alternative non-linear stimulus reconstruction using three different datasets. AADNet shows a significant performance improvement for both subject-specific and subject-independent models. Notably, the average subject-independent classification accuracies for different analysis window lengths range from 56.3% (1 s) to 78.1% (20 s), 57.5% (1 s) to 89.4% (40 s), and 56.0% (1 s) to 82.6% (40 s) for three validated datasets, respectively, showing a significantly improved ability to generalize to data from unseen subjects. These results highlight the potential of deep learning models for advancing AAD, with promising implications for future hearing aids, assistive devices, and clinical assessments.</p>","PeriodicalId":13419,"journal":{"name":"IEEE Transactions on Neural Systems and Rehabilitation Engineering","volume":"PP ","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Neural Systems and Rehabilitation Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/TNSRE.2025.3587637","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Auditory attention decoding (AAD) is the process of identifying the attended speech in a multi-talker environment using brain signals, typically recorded through electroencephalography (EEG). Over the past decade, AAD has undergone continuous development, driven by its promising application in neuro-steered hearing devices. Most AAD algorithms are relying on the increase in neural entrainment to the envelope of attended speech, as compared to unattended speech, typically using a two-step approach. First, the algorithm predicts representations of the attended speech signal envelopes; second, it identifies the attended speech by finding the highest correlation between the predictions and the representations of the actual speech signals. In this study, we proposed a novel end-to-end neural network architecture, named AADNet, which combines these two stages into a direct approach to address the AAD problem. We compare the proposed network against traditional stimulus decoding-based approaches, including linear stimulus reconstruction, canonical correlation analysis, and an alternative non-linear stimulus reconstruction using three different datasets. AADNet shows a significant performance improvement for both subject-specific and subject-independent models. Notably, the average subject-independent classification accuracies for different analysis window lengths range from 56.3% (1 s) to 78.1% (20 s), 57.5% (1 s) to 89.4% (40 s), and 56.0% (1 s) to 82.6% (40 s) for three validated datasets, respectively, showing a significantly improved ability to generalize to data from unseen subjects. These results highlight the potential of deep learning models for advancing AAD, with promising implications for future hearing aids, assistive devices, and clinical assessments.

查看原文本刊更多论文

听觉注意解码的端到端深度学习模型。

听觉注意解码（AAD）是利用脑信号识别多说话环境中被倾听的演讲的过程，通常通过脑电图（EEG）记录。在过去的十年中，AAD在神经导向听力装置中的应用前景不断发展。与无人参与的演讲相比，大多数AAD算法依赖于有在场演讲包络的神经牵引的增加，通常使用两步方法。首先，该算法预测与会语音信号包络的表示；其次，它通过找到预测和实际语音信号表示之间的最高相关性来识别出席的演讲。在这项研究中，我们提出了一种新的端到端神经网络架构，称为AADNet，它将这两个阶段结合成一种直接解决AAD问题的方法。我们将所提出的网络与传统的基于刺激解码的方法进行了比较，这些方法包括线性刺激重构、典型相关分析和使用三种不同数据集的非线性刺激重构。AADNet对特定于主题的模型和独立于主题的模型都显示了显著的性能改进。值得注意的是，对于三个验证数据集，不同分析窗口长度的平均受试者独立分类准确率分别为56.3%（1秒）至78.1%（20秒）、57.5%（1秒）至89.4%（40秒）和56.0%（1秒）至82.6%（40秒），显示出对未知受试者数据的推广能力显著提高。这些结果突出了深度学习模型在推进AAD方面的潜力，对未来的助听器、辅助设备和临床评估具有重要意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Neural Systems and Rehabilitation Engineering 医学-工程：生物医学

CiteScore

8.60

自引率

8.20%

发文量

479

审稿时长

6-12 weeks

期刊介绍： Rehabilitative and neural aspects of biomedical engineering, including functional electrical stimulation, acoustic dynamics, human performance measurement and analysis, nerve stimulation, electromyography, motor control and stimulation; and hardware and software applications for rehabilitation engineering and assistive devices.