Yuanming Zhang;Jing Lu;Fei Chen;Haoliang Du;Xia Gao;Zhibin Lin
{"title":"Multi-Class Decoding of Attended Speaker Direction Using Electroencephalogram and Audio Spatial Spectrum","authors":"Yuanming Zhang;Jing Lu;Fei Chen;Haoliang Du;Xia Gao;Zhibin Lin","doi":"10.1109/TNSRE.2025.3591819","DOIUrl":null,"url":null,"abstract":"Prior research on directional focus decoding, a.k.a. selective Auditory Attention Decoding (sAAD), has primarily focused on binary “left-right” tasks. However, decoding of the attended speaker’s precise direction is desired. Existing approaches often underutilize spatial audio information, resulting in suboptimal performance. In this paper, we address this limitation by leveraging a recent dataset containing two concurrent speakers at two of 14 possible directions. We demonstrate that models relying solely on EEG yield limited decoding accuracy in leave-one-out settings. To enhance performance, we propose to integrate spatial spectra as an additional input. We evaluate three model architectures, namely CNN, LSM-CNN, and Deformer, under two strategies for utilizing spatial information: all-in-one (end-to-end) and pairwise (two-stage) decoding. While all-in-one decoders directly take dual-modal inputs and output the attended direction, pairwise decoders first leverage spatial spectra to decode the competing pairs, and then a specific model is used to decode the attended direction. Our proposed all-in-one Sp-EEG-Deformer model achieves 14-class decoding accuracies of 55.35% and 57.19% in leave-one-subject-out and leave-one-trial-out scenarios, respectively, using 1-second decision windows (chance level: 50%, indicating random guessing). Meanwhile, the pairwise Sp-EEG-Deformer decoder achieves a 14-class decoding accuracy of 63.62% (10 s). Our experiments reveal that spatial spectra are particularly effective at reducing the 14-class problem into a binary one. On the other hand, EEG features are more discriminative and play a crucial role in precisely identifying the final attended direction within this reduced 2-class set. These results highlight the effectiveness of our proposed dual-modal directional decoding strategies.","PeriodicalId":13419,"journal":{"name":"IEEE Transactions on Neural Systems and Rehabilitation Engineering","volume":"33 ","pages":"2892-2903"},"PeriodicalIF":5.2000,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11091336","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Neural Systems and Rehabilitation Engineering","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11091336/","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Prior research on directional focus decoding, a.k.a. selective Auditory Attention Decoding (sAAD), has primarily focused on binary “left-right” tasks. However, decoding of the attended speaker’s precise direction is desired. Existing approaches often underutilize spatial audio information, resulting in suboptimal performance. In this paper, we address this limitation by leveraging a recent dataset containing two concurrent speakers at two of 14 possible directions. We demonstrate that models relying solely on EEG yield limited decoding accuracy in leave-one-out settings. To enhance performance, we propose to integrate spatial spectra as an additional input. We evaluate three model architectures, namely CNN, LSM-CNN, and Deformer, under two strategies for utilizing spatial information: all-in-one (end-to-end) and pairwise (two-stage) decoding. While all-in-one decoders directly take dual-modal inputs and output the attended direction, pairwise decoders first leverage spatial spectra to decode the competing pairs, and then a specific model is used to decode the attended direction. Our proposed all-in-one Sp-EEG-Deformer model achieves 14-class decoding accuracies of 55.35% and 57.19% in leave-one-subject-out and leave-one-trial-out scenarios, respectively, using 1-second decision windows (chance level: 50%, indicating random guessing). Meanwhile, the pairwise Sp-EEG-Deformer decoder achieves a 14-class decoding accuracy of 63.62% (10 s). Our experiments reveal that spatial spectra are particularly effective at reducing the 14-class problem into a binary one. On the other hand, EEG features are more discriminative and play a crucial role in precisely identifying the final attended direction within this reduced 2-class set. These results highlight the effectiveness of our proposed dual-modal directional decoding strategies.
期刊介绍:
Rehabilitative and neural aspects of biomedical engineering, including functional electrical stimulation, acoustic dynamics, human performance measurement and analysis, nerve stimulation, electromyography, motor control and stimulation; and hardware and software applications for rehabilitation engineering and assistive devices.