几何信息图神经网络在具有挑战性的声环境中的分布式语音增强

IF 2.2 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Sensors Letters Pub Date : 2025-04-28 DOI:10.1109/LSENS.2025.3564309

Shih-Hau Fang;Po-Han Li;Hau-Hsiang Jung;Syu-Siang Wang

{"title":"几何信息图神经网络在具有挑战性的声环境中的分布式语音增强","authors":"Shih-Hau Fang;Po-Han Li;Hau-Hsiang Jung;Syu-Siang Wang","doi":"10.1109/LSENS.2025.3564309","DOIUrl":null,"url":null,"abstract":"Speech signals face challenges in the presence of ambient sounds, encompassing factors, such as reverberation and diffuse noise, leading to compromises in clarity and intelligibility. Taking inspiration from the effectiveness of human binaural hearing, researchers have delved into the exploration of distributed speech enhancement (DSE) processors on the distributed microphone system. Despite the achievements of previous approaches, challenges remain, especially in reducing noise from speech signals and improving model interpretability. This letter introduces an innovative geometrically informed graph neural network (GIGNN) designed for DSE tasks. The distinct advantage of GIGNN lies in the capability of graph neural networks (GNNs) to visualize structured data, providing effective in parameterizing a wide array of spatiotemporal interactions and accordingly enhancing the model interpretability. In addition, we assess the effectiveness of geometrically informed spatial matrices within GNNs in our evaluation. Experimental validation in varying signal-to-noise ratios in real-life scenarios underscores the potential of GIGNN.","PeriodicalId":13014,"journal":{"name":"IEEE Sensors Letters","volume":"9 6","pages":"1-4"},"PeriodicalIF":2.2000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Geometrically Informed Graph Neural Networks for Distributed Speech Enhancement in Challenging Acoustic Environments\",\"authors\":\"Shih-Hau Fang;Po-Han Li;Hau-Hsiang Jung;Syu-Siang Wang\",\"doi\":\"10.1109/LSENS.2025.3564309\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech signals face challenges in the presence of ambient sounds, encompassing factors, such as reverberation and diffuse noise, leading to compromises in clarity and intelligibility. Taking inspiration from the effectiveness of human binaural hearing, researchers have delved into the exploration of distributed speech enhancement (DSE) processors on the distributed microphone system. Despite the achievements of previous approaches, challenges remain, especially in reducing noise from speech signals and improving model interpretability. This letter introduces an innovative geometrically informed graph neural network (GIGNN) designed for DSE tasks. The distinct advantage of GIGNN lies in the capability of graph neural networks (GNNs) to visualize structured data, providing effective in parameterizing a wide array of spatiotemporal interactions and accordingly enhancing the model interpretability. In addition, we assess the effectiveness of geometrically informed spatial matrices within GNNs in our evaluation. Experimental validation in varying signal-to-noise ratios in real-life scenarios underscores the potential of GIGNN.\",\"PeriodicalId\":13014,\"journal\":{\"name\":\"IEEE Sensors Letters\",\"volume\":\"9 6\",\"pages\":\"1-4\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2025-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Sensors Letters\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10978873/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Letters","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10978873/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

语音信号在环境声音的存在下面临挑战，包括混响和漫射噪声等因素，导致清晰度和可理解性的妥协。受人类双耳听力有效性的启发，研究人员对分布式麦克风系统上的分布式语音增强（DSE）处理器进行了深入的探索。尽管以前的方法取得了成就，但挑战仍然存在，特别是在减少语音信号中的噪声和提高模型的可解释性方面。这封信介绍了一种创新的几何信息图神经网络（GIGNN），专为DSE任务设计。GIGNN的独特优势在于图形神经网络（gnn）可视化结构化数据的能力，提供了有效的参数化各种时空相互作用的能力，从而增强了模型的可解释性。此外，在我们的评估中，我们评估了gnn中几何信息空间矩阵的有效性。在现实场景中不同信噪比的实验验证强调了GIGNN的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Geometrically Informed Graph Neural Networks for Distributed Speech Enhancement in Challenging Acoustic Environments

Speech signals face challenges in the presence of ambient sounds, encompassing factors, such as reverberation and diffuse noise, leading to compromises in clarity and intelligibility. Taking inspiration from the effectiveness of human binaural hearing, researchers have delved into the exploration of distributed speech enhancement (DSE) processors on the distributed microphone system. Despite the achievements of previous approaches, challenges remain, especially in reducing noise from speech signals and improving model interpretability. This letter introduces an innovative geometrically informed graph neural network (GIGNN) designed for DSE tasks. The distinct advantage of GIGNN lies in the capability of graph neural networks (GNNs) to visualize structured data, providing effective in parameterizing a wide array of spatiotemporal interactions and accordingly enhancing the model interpretability. In addition, we assess the effectiveness of geometrically informed spatial matrices within GNNs in our evaluation. Experimental validation in varying signal-to-noise ratios in real-life scenarios underscores the potential of GIGNN.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Sensors Letters Engineering-Electrical and Electronic Engineering

CiteScore

3.50

自引率

7.10%

发文量

194