A novel hierarchical attention-guided refinement method with EEG assistance for enhancing target speech in a multi-speaker competing environment

IF 8 1区 工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Zehui Feng , Yangge Yang , Chenqi Zhang , Junxuan Li , Ting Han
{"title":"A novel hierarchical attention-guided refinement method with EEG assistance for enhancing target speech in a multi-speaker competing environment","authors":"Zehui Feng ,&nbsp;Yangge Yang ,&nbsp;Chenqi Zhang ,&nbsp;Junxuan Li ,&nbsp;Ting Han","doi":"10.1016/j.aei.2025.103363","DOIUrl":null,"url":null,"abstract":"<div><div>Enhancing target speech in noisy, multi-speaker environments is a critical challenge, particularly in engineering contexts, such as construction sites, factories, and transportation systems, where multi-source competing speech scenarios are common and the need for efficient speech enhancement is critical to ensuring safety and operational effectiveness. The latest research is prone to recovering auditory attention with brain activity assistance. However, existing methods emerged with the challenges of multimodal feature extraction bottleneck, and fusion bottleneck. To address these challenges, this paper proposes a hierarchical attention-guided refinement network for enhancing EEG-assisted speech (HierEEG). HierEEG is an end-to-end explainable time-domain model comprising three core modules: a Multi-Scale Feature Modulation Refinement (MFMR) module, a Hierarchical Attention Fusion (HAF) network, and a Lightweight Speech Decoder. The first module learns the different granularities of feature representations and facilitates the interaction between short-term and long-term features through a feature modulator, obtaining multi-scale refined speech embeddings and EEG features. Then, the second module hierarchically guides the model’s attention focusing on high-level semantic features, outputting the generation of clean speech mask embeddings. Finally, a lightweight speech decoder is used to reconstruct the clean speech sample. Our comprehensive experiments on comparison, ablation, subject-dependent, subject-independent, transfer-learning, engineering, and calculation-cost experiments show that our proposed framework, HierEEG, outperforms state-of-the-art methods on mainstream Cocktail Party Datasets, especially achieving relative improvements of 0.21 dB and 0.15 in SI-SDR and PESQ. The proposed HierEEG validates the robustness in engineer simulated experiment, over 10 dB accuracy even with the various noises, artifacts, and poor contact. Furthermore, HierEEG makes great transfer performance for personalized user-specific adjustments, with simply 12 min of fine-tuning samples. HierEEG’s efficient processing and low computational cost, with under 70 % inference utilization on the Jeston Nano embedding device, enhances the potential applications in multi-speaker competing environments. Finally, the brain region experiment demonstrates the explainability of HierEEG, which ensures that the decisions made by the HierEEG can be understood in the context of the brain’s functional organization.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"65 ","pages":"Article 103363"},"PeriodicalIF":8.0000,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced Engineering Informatics","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1474034625002563","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Enhancing target speech in noisy, multi-speaker environments is a critical challenge, particularly in engineering contexts, such as construction sites, factories, and transportation systems, where multi-source competing speech scenarios are common and the need for efficient speech enhancement is critical to ensuring safety and operational effectiveness. The latest research is prone to recovering auditory attention with brain activity assistance. However, existing methods emerged with the challenges of multimodal feature extraction bottleneck, and fusion bottleneck. To address these challenges, this paper proposes a hierarchical attention-guided refinement network for enhancing EEG-assisted speech (HierEEG). HierEEG is an end-to-end explainable time-domain model comprising three core modules: a Multi-Scale Feature Modulation Refinement (MFMR) module, a Hierarchical Attention Fusion (HAF) network, and a Lightweight Speech Decoder. The first module learns the different granularities of feature representations and facilitates the interaction between short-term and long-term features through a feature modulator, obtaining multi-scale refined speech embeddings and EEG features. Then, the second module hierarchically guides the model’s attention focusing on high-level semantic features, outputting the generation of clean speech mask embeddings. Finally, a lightweight speech decoder is used to reconstruct the clean speech sample. Our comprehensive experiments on comparison, ablation, subject-dependent, subject-independent, transfer-learning, engineering, and calculation-cost experiments show that our proposed framework, HierEEG, outperforms state-of-the-art methods on mainstream Cocktail Party Datasets, especially achieving relative improvements of 0.21 dB and 0.15 in SI-SDR and PESQ. The proposed HierEEG validates the robustness in engineer simulated experiment, over 10 dB accuracy even with the various noises, artifacts, and poor contact. Furthermore, HierEEG makes great transfer performance for personalized user-specific adjustments, with simply 12 min of fine-tuning samples. HierEEG’s efficient processing and low computational cost, with under 70 % inference utilization on the Jeston Nano embedding device, enhances the potential applications in multi-speaker competing environments. Finally, the brain region experiment demonstrates the explainability of HierEEG, which ensures that the decisions made by the HierEEG can be understood in the context of the brain’s functional organization.
一种基于脑电辅助的分层注意引导改进方法用于多说话人竞争环境下的目标语音增强
在嘈杂的多扬声器环境中增强目标语音是一项关键的挑战,特别是在工程环境中,如建筑工地、工厂和交通系统,在这些环境中,多源竞争语音场景很常见,对高效语音增强的需求对确保安全和运行效率至关重要。最新的研究倾向于在大脑活动的帮助下恢复听觉注意力。然而,现有方法存在多模态特征提取瓶颈和融合瓶颈等问题。为了解决这些问题,本文提出了一种分层注意引导的改进网络来增强脑电辅助语音(HierEEG)。HierEEG是一个端到端可解释的时域模型,包括三个核心模块:多尺度特征调制细化(MFMR)模块、分层注意融合(HAF)网络和轻量级语音解码器。第一个模块学习特征表示的不同粒度,通过特征调制器促进短期和长期特征之间的交互,获得多尺度精细语音嵌入和脑电特征。然后,第二个模块分层引导模型关注高级语义特征,输出生成干净的语音掩码嵌入。最后,使用轻量级语音解码器重构干净的语音样本。我们在对比、消除、科目依赖、科目独立、迁移学习、工程和计算成本实验等方面的综合实验表明,我们提出的框架HierEEG在主流鸡尾酒会数据集上优于最先进的方法,特别是在SI-SDR和PESQ上实现了0.21 dB和0.15 dB的相对改进。在工程模拟实验中验证了该方法的鲁棒性,即使在各种噪声、伪影和接触不良的情况下,精度也超过10 dB。此外,HierEEG为个性化用户特定调整提供了出色的传输性能,只需12分钟的微调样本。HierEEG的高效处理和低计算成本,在Jeston纳米嵌入设备上的推理利用率低于70%,增强了在多扬声器竞争环境中的潜在应用。最后,通过脑区实验验证了分层脑电图的可解释性,确保了分层脑电图的决策可以在大脑功能组织的背景下被理解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Advanced Engineering Informatics
Advanced Engineering Informatics 工程技术-工程:综合
CiteScore
12.40
自引率
18.20%
发文量
292
审稿时长
45 days
期刊介绍: Advanced Engineering Informatics is an international Journal that solicits research papers with an emphasis on 'knowledge' and 'engineering applications'. The Journal seeks original papers that report progress in applying methods of engineering informatics. These papers should have engineering relevance and help provide a scientific base for more reliable, spontaneous, and creative engineering decision-making. Additionally, papers should demonstrate the science of supporting knowledge-intensive engineering tasks and validate the generality, power, and scalability of new methods through rigorous evaluation, preferably both qualitatively and quantitatively. Abstracting and indexing for Advanced Engineering Informatics include Science Citation Index Expanded, Scopus and INSPEC.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信