Zehui Feng , Yangge Yang , Chenqi Zhang , Junxuan Li , Ting Han
{"title":"一种基于脑电辅助的分层注意引导改进方法用于多说话人竞争环境下的目标语音增强","authors":"Zehui Feng , Yangge Yang , Chenqi Zhang , Junxuan Li , Ting Han","doi":"10.1016/j.aei.2025.103363","DOIUrl":null,"url":null,"abstract":"<div><div>Enhancing target speech in noisy, multi-speaker environments is a critical challenge, particularly in engineering contexts, such as construction sites, factories, and transportation systems, where multi-source competing speech scenarios are common and the need for efficient speech enhancement is critical to ensuring safety and operational effectiveness. The latest research is prone to recovering auditory attention with brain activity assistance. However, existing methods emerged with the challenges of multimodal feature extraction bottleneck, and fusion bottleneck. To address these challenges, this paper proposes a hierarchical attention-guided refinement network for enhancing EEG-assisted speech (HierEEG). HierEEG is an end-to-end explainable time-domain model comprising three core modules: a Multi-Scale Feature Modulation Refinement (MFMR) module, a Hierarchical Attention Fusion (HAF) network, and a Lightweight Speech Decoder. The first module learns the different granularities of feature representations and facilitates the interaction between short-term and long-term features through a feature modulator, obtaining multi-scale refined speech embeddings and EEG features. Then, the second module hierarchically guides the model’s attention focusing on high-level semantic features, outputting the generation of clean speech mask embeddings. Finally, a lightweight speech decoder is used to reconstruct the clean speech sample. Our comprehensive experiments on comparison, ablation, subject-dependent, subject-independent, transfer-learning, engineering, and calculation-cost experiments show that our proposed framework, HierEEG, outperforms state-of-the-art methods on mainstream Cocktail Party Datasets, especially achieving relative improvements of 0.21 dB and 0.15 in SI-SDR and PESQ. The proposed HierEEG validates the robustness in engineer simulated experiment, over 10 dB accuracy even with the various noises, artifacts, and poor contact. Furthermore, HierEEG makes great transfer performance for personalized user-specific adjustments, with simply 12 min of fine-tuning samples. HierEEG’s efficient processing and low computational cost, with under 70 % inference utilization on the Jeston Nano embedding device, enhances the potential applications in multi-speaker competing environments. Finally, the brain region experiment demonstrates the explainability of HierEEG, which ensures that the decisions made by the HierEEG can be understood in the context of the brain’s functional organization.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"65 ","pages":"Article 103363"},"PeriodicalIF":8.0000,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A novel hierarchical attention-guided refinement method with EEG assistance for enhancing target speech in a multi-speaker competing environment\",\"authors\":\"Zehui Feng , Yangge Yang , Chenqi Zhang , Junxuan Li , Ting Han\",\"doi\":\"10.1016/j.aei.2025.103363\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Enhancing target speech in noisy, multi-speaker environments is a critical challenge, particularly in engineering contexts, such as construction sites, factories, and transportation systems, where multi-source competing speech scenarios are common and the need for efficient speech enhancement is critical to ensuring safety and operational effectiveness. The latest research is prone to recovering auditory attention with brain activity assistance. However, existing methods emerged with the challenges of multimodal feature extraction bottleneck, and fusion bottleneck. To address these challenges, this paper proposes a hierarchical attention-guided refinement network for enhancing EEG-assisted speech (HierEEG). HierEEG is an end-to-end explainable time-domain model comprising three core modules: a Multi-Scale Feature Modulation Refinement (MFMR) module, a Hierarchical Attention Fusion (HAF) network, and a Lightweight Speech Decoder. The first module learns the different granularities of feature representations and facilitates the interaction between short-term and long-term features through a feature modulator, obtaining multi-scale refined speech embeddings and EEG features. Then, the second module hierarchically guides the model’s attention focusing on high-level semantic features, outputting the generation of clean speech mask embeddings. Finally, a lightweight speech decoder is used to reconstruct the clean speech sample. Our comprehensive experiments on comparison, ablation, subject-dependent, subject-independent, transfer-learning, engineering, and calculation-cost experiments show that our proposed framework, HierEEG, outperforms state-of-the-art methods on mainstream Cocktail Party Datasets, especially achieving relative improvements of 0.21 dB and 0.15 in SI-SDR and PESQ. The proposed HierEEG validates the robustness in engineer simulated experiment, over 10 dB accuracy even with the various noises, artifacts, and poor contact. Furthermore, HierEEG makes great transfer performance for personalized user-specific adjustments, with simply 12 min of fine-tuning samples. HierEEG’s efficient processing and low computational cost, with under 70 % inference utilization on the Jeston Nano embedding device, enhances the potential applications in multi-speaker competing environments. Finally, the brain region experiment demonstrates the explainability of HierEEG, which ensures that the decisions made by the HierEEG can be understood in the context of the brain’s functional organization.</div></div>\",\"PeriodicalId\":50941,\"journal\":{\"name\":\"Advanced Engineering Informatics\",\"volume\":\"65 \",\"pages\":\"Article 103363\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-04-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advanced Engineering Informatics\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1474034625002563\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced Engineering Informatics","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1474034625002563","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A novel hierarchical attention-guided refinement method with EEG assistance for enhancing target speech in a multi-speaker competing environment
Enhancing target speech in noisy, multi-speaker environments is a critical challenge, particularly in engineering contexts, such as construction sites, factories, and transportation systems, where multi-source competing speech scenarios are common and the need for efficient speech enhancement is critical to ensuring safety and operational effectiveness. The latest research is prone to recovering auditory attention with brain activity assistance. However, existing methods emerged with the challenges of multimodal feature extraction bottleneck, and fusion bottleneck. To address these challenges, this paper proposes a hierarchical attention-guided refinement network for enhancing EEG-assisted speech (HierEEG). HierEEG is an end-to-end explainable time-domain model comprising three core modules: a Multi-Scale Feature Modulation Refinement (MFMR) module, a Hierarchical Attention Fusion (HAF) network, and a Lightweight Speech Decoder. The first module learns the different granularities of feature representations and facilitates the interaction between short-term and long-term features through a feature modulator, obtaining multi-scale refined speech embeddings and EEG features. Then, the second module hierarchically guides the model’s attention focusing on high-level semantic features, outputting the generation of clean speech mask embeddings. Finally, a lightweight speech decoder is used to reconstruct the clean speech sample. Our comprehensive experiments on comparison, ablation, subject-dependent, subject-independent, transfer-learning, engineering, and calculation-cost experiments show that our proposed framework, HierEEG, outperforms state-of-the-art methods on mainstream Cocktail Party Datasets, especially achieving relative improvements of 0.21 dB and 0.15 in SI-SDR and PESQ. The proposed HierEEG validates the robustness in engineer simulated experiment, over 10 dB accuracy even with the various noises, artifacts, and poor contact. Furthermore, HierEEG makes great transfer performance for personalized user-specific adjustments, with simply 12 min of fine-tuning samples. HierEEG’s efficient processing and low computational cost, with under 70 % inference utilization on the Jeston Nano embedding device, enhances the potential applications in multi-speaker competing environments. Finally, the brain region experiment demonstrates the explainability of HierEEG, which ensures that the decisions made by the HierEEG can be understood in the context of the brain’s functional organization.
期刊介绍:
Advanced Engineering Informatics is an international Journal that solicits research papers with an emphasis on 'knowledge' and 'engineering applications'. The Journal seeks original papers that report progress in applying methods of engineering informatics. These papers should have engineering relevance and help provide a scientific base for more reliable, spontaneous, and creative engineering decision-making. Additionally, papers should demonstrate the science of supporting knowledge-intensive engineering tasks and validate the generality, power, and scalability of new methods through rigorous evaluation, preferably both qualitatively and quantitatively. Abstracting and indexing for Advanced Engineering Informatics include Science Citation Index Expanded, Scopus and INSPEC.