Improving auditory attention decoding in noisy environments for listeners with hearing impairment through contrastive learning.

Journal of neural engineering Pub Date : 2025-06-18 DOI:10.1088/1741-2552/ade28a

Gautam Sridhar, Sofía Boselli, Martin A Skoglund, Bo Bernhardsson, Emina Alickovic

{"title":"Improving auditory attention decoding in noisy environments for listeners with hearing impairment through contrastive learning.","authors":"Gautam Sridhar, Sofía Boselli, Martin A Skoglund, Bo Bernhardsson, Emina Alickovic","doi":"10.1088/1741-2552/ade28a","DOIUrl":null,"url":null,"abstract":"Objective. This study aimed to investigate the potential of contrastive learning to improve auditory attention decoding (AAD) using electroencephalography (EEG) data in challenging cocktail-party scenarios with competing speech and background noise.Approach. Three different models were implemented for comparison: a baseline linear model (LM), a non-LM without contrastive learning (NLM), and a non-LM with contrastive learning (NLMwCL). The EEG data and speech envelopes were used to train these models. The NLMwCL model used SigLIP, a variant of CLIP loss, to embed the data. The speech envelopes were reconstructed from the models and compared with the attended and ignored speech envelopes to assess reconstruction accuracy, measured as the correlation between the reconstructed and actual speech envelopes. These reconstruction accuracies were then compared to classify attention. All models were evaluated in 34 listeners with hearing impairment.Results. The reconstruction accuracy for attended and ignored speech, along with attention classification accuracy, was calculated for each model across various time windows. The NLMwCL consistently outperformed the other models in both speech reconstruction and attention classification. For a 3-second time window, the NLMwCL model achieved a mean attended speech reconstruction accuracy of 0.105 and a mean attention classification accuracy of 68.0%, while the NLM model scored 0.096 and 64.4%, and the LM achieved 0.084 and 62.6%, respectively.Significance. These findings demonstrate the promise of contrastive learning in improving AAD and highlight the potential of EEG-based tools for clinical applications, and progress in hearing technology, particularly in the design of new neuro-steered signal processing algorithms.","PeriodicalId":94096,"journal":{"name":"Journal of neural engineering","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of neural engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/1741-2552/ade28a","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Objective. This study aimed to investigate the potential of contrastive learning to improve auditory attention decoding (AAD) using electroencephalography (EEG) data in challenging cocktail-party scenarios with competing speech and background noise.Approach. Three different models were implemented for comparison: a baseline linear model (LM), a non-LM without contrastive learning (NLM), and a non-LM with contrastive learning (NLMwCL). The EEG data and speech envelopes were used to train these models. The NLMwCL model used SigLIP, a variant of CLIP loss, to embed the data. The speech envelopes were reconstructed from the models and compared with the attended and ignored speech envelopes to assess reconstruction accuracy, measured as the correlation between the reconstructed and actual speech envelopes. These reconstruction accuracies were then compared to classify attention. All models were evaluated in 34 listeners with hearing impairment.Results. The reconstruction accuracy for attended and ignored speech, along with attention classification accuracy, was calculated for each model across various time windows. The NLMwCL consistently outperformed the other models in both speech reconstruction and attention classification. For a 3-second time window, the NLMwCL model achieved a mean attended speech reconstruction accuracy of 0.105 and a mean attention classification accuracy of 68.0%, while the NLM model scored 0.096 and 64.4%, and the LM achieved 0.084 and 62.6%, respectively.Significance. These findings demonstrate the promise of contrastive learning in improving AAD and highlight the potential of EEG-based tools for clinical applications, and progress in hearing technology, particularly in the design of new neuro-steered signal processing algorithms.

查看原文本刊更多论文

通过对比学习提高听力障碍听者在嘈杂环境中的听觉注意解码。

目的：利用脑电图数据，探讨在具有竞争性语音和背景噪声的鸡尾酒会情境下，对比学习对听觉注意解码（AAD）的改善潜力。方法：采用三种不同的模型进行比较：基线线性模型（LM）、不含对比学习的非线性模型（NLM）和含对比学习的非线性模型（NLMwCL）。利用脑电数据和语音包络对模型进行训练。NLMwCL模型使用了SigLIP （CLIP loss的一种变体）来嵌入数据。从模型中重建语音信封，并与出席和忽略的语音信封进行比较，以评估重建的准确性，以重建的语音信封与实际语音信封的相关性来衡量。然后比较这些重建的准确性来分类注意力。在34名听力受损的听者中对所有模型进行了评估。结果：计算了每个模型在不同时间窗下对注意和忽略语音的重构精度以及注意分类精度。NLMwCL在语音重构和注意分类方面均优于其他模型。在3秒时间窗下，NLMwCL模型的平均出席语音重构准确率为0.105，平均注意分类准确率为68.0%，NLM模型的平均出席语音重构准确率为0.096，平均注意力分类准确率为64.4%，LM模型的平均注意力分类准确率为0.084，平均注意力分类准确率为62.6%。意义：这些发现证明了对比学习在改善AAD方面的前景，并强调了基于脑电图的工具在临床应用中的潜力，以及听力技术的进步，特别是在设计新的神经导向信号处理算法方面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of neural engineering

自引率

0.00%

发文量