Multi-Class Decoding of Attended Speaker Direction Using Electroencephalogram and Audio Spatial Spectrum

IF 5.2 2区 医学 Q2 ENGINEERING, BIOMEDICAL
Yuanming Zhang;Jing Lu;Fei Chen;Haoliang Du;Xia Gao;Zhibin Lin
{"title":"Multi-Class Decoding of Attended Speaker Direction Using Electroencephalogram and Audio Spatial Spectrum","authors":"Yuanming Zhang;Jing Lu;Fei Chen;Haoliang Du;Xia Gao;Zhibin Lin","doi":"10.1109/TNSRE.2025.3591819","DOIUrl":null,"url":null,"abstract":"Prior research on directional focus decoding, a.k.a. selective Auditory Attention Decoding (sAAD), has primarily focused on binary “left-right” tasks. However, decoding of the attended speaker’s precise direction is desired. Existing approaches often underutilize spatial audio information, resulting in suboptimal performance. In this paper, we address this limitation by leveraging a recent dataset containing two concurrent speakers at two of 14 possible directions. We demonstrate that models relying solely on EEG yield limited decoding accuracy in leave-one-out settings. To enhance performance, we propose to integrate spatial spectra as an additional input. We evaluate three model architectures, namely CNN, LSM-CNN, and Deformer, under two strategies for utilizing spatial information: all-in-one (end-to-end) and pairwise (two-stage) decoding. While all-in-one decoders directly take dual-modal inputs and output the attended direction, pairwise decoders first leverage spatial spectra to decode the competing pairs, and then a specific model is used to decode the attended direction. Our proposed all-in-one Sp-EEG-Deformer model achieves 14-class decoding accuracies of 55.35% and 57.19% in leave-one-subject-out and leave-one-trial-out scenarios, respectively, using 1-second decision windows (chance level: 50%, indicating random guessing). Meanwhile, the pairwise Sp-EEG-Deformer decoder achieves a 14-class decoding accuracy of 63.62% (10 s). Our experiments reveal that spatial spectra are particularly effective at reducing the 14-class problem into a binary one. On the other hand, EEG features are more discriminative and play a crucial role in precisely identifying the final attended direction within this reduced 2-class set. These results highlight the effectiveness of our proposed dual-modal directional decoding strategies.","PeriodicalId":13419,"journal":{"name":"IEEE Transactions on Neural Systems and Rehabilitation Engineering","volume":"33 ","pages":"2892-2903"},"PeriodicalIF":5.2000,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11091336","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Neural Systems and Rehabilitation Engineering","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11091336/","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Prior research on directional focus decoding, a.k.a. selective Auditory Attention Decoding (sAAD), has primarily focused on binary “left-right” tasks. However, decoding of the attended speaker’s precise direction is desired. Existing approaches often underutilize spatial audio information, resulting in suboptimal performance. In this paper, we address this limitation by leveraging a recent dataset containing two concurrent speakers at two of 14 possible directions. We demonstrate that models relying solely on EEG yield limited decoding accuracy in leave-one-out settings. To enhance performance, we propose to integrate spatial spectra as an additional input. We evaluate three model architectures, namely CNN, LSM-CNN, and Deformer, under two strategies for utilizing spatial information: all-in-one (end-to-end) and pairwise (two-stage) decoding. While all-in-one decoders directly take dual-modal inputs and output the attended direction, pairwise decoders first leverage spatial spectra to decode the competing pairs, and then a specific model is used to decode the attended direction. Our proposed all-in-one Sp-EEG-Deformer model achieves 14-class decoding accuracies of 55.35% and 57.19% in leave-one-subject-out and leave-one-trial-out scenarios, respectively, using 1-second decision windows (chance level: 50%, indicating random guessing). Meanwhile, the pairwise Sp-EEG-Deformer decoder achieves a 14-class decoding accuracy of 63.62% (10 s). Our experiments reveal that spatial spectra are particularly effective at reducing the 14-class problem into a binary one. On the other hand, EEG features are more discriminative and play a crucial role in precisely identifying the final attended direction within this reduced 2-class set. These results highlight the effectiveness of our proposed dual-modal directional decoding strategies.
基于脑电图和音频空间谱的有听众方向多类解码。
先前对定向焦点解码的研究,又称选择性听觉注意解码(sAAD),主要集中在二元“左右”任务上。然而,需要解码出席的说话人的精确方向。现有的方法往往没有充分利用空间音频信息,导致性能不理想。在本文中,我们通过利用最近的数据集来解决这一限制,该数据集包含14个可能方向中的两个并发演讲者。我们证明了仅依赖脑电图的模型在留一设置下产生有限的解码精度。为了提高性能,我们建议将空间光谱作为一个额外的输入。我们在两种利用空间信息的策略下评估了三种模型架构,即CNN、LSM-CNN和Deformer: all-in-one(端到端)和pair - wise(两阶段)解码。一体化解码器直接采用双模输入输出出席方向,而成对解码器首先利用空间频谱对竞争对进行解码,然后使用特定模型对出席方向进行解码。我们提出的一体化Sp-EEG-Deformer模型使用1秒决策窗口(机会水平为50%,表明随机猜测),在留一个受试者和留一个试验场景下,分别实现了55.35%和57.19%的14级解码准确率。同时,双向Sp-EEG-Deformer解码器实现了14级解码精度63.62% (10 s)。我们的实验表明,空间光谱在将14类问题简化为二元问题方面特别有效。另一方面,脑电特征具有更强的判别性,在精确识别最终参与方向方面起着至关重要的作用。这些结果突出了我们提出的双峰方向解码策略的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
8.60
自引率
8.20%
发文量
479
审稿时长
6-12 weeks
期刊介绍: Rehabilitative and neural aspects of biomedical engineering, including functional electrical stimulation, acoustic dynamics, human performance measurement and analysis, nerve stimulation, electromyography, motor control and stimulation; and hardware and software applications for rehabilitation engineering and assistive devices.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信