基于LSTM网络和卡尔曼滤波的骨传导和空气传导语音融合

IF 4.3 2区综合性期刊 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Sensors Journal Pub Date : 2025-03-31 DOI:10.1109/JSEN.2025.3554186

Zhenglong Liu;Zhe Chen;Dapeng Yu;Fuliang Yin

{"title":"基于LSTM网络和卡尔曼滤波的骨传导和空气传导语音融合","authors":"Zhenglong Liu;Zhe Chen;Dapeng Yu;Fuliang Yin","doi":"10.1109/JSEN.2025.3554186","DOIUrl":null,"url":null,"abstract":"Integrating bone-conducted (BC) and air-conducted (AC) microphones can synergistically enhance noise suppression and improve speech quality. For this purpose, a novel time-domain BC and AC speech fusion enhancement method based on the long short-term memory (LSTM) network and Kalman filtering is proposed. Specifically, the line spectral frequencies (LSFs) and the corresponding residual power of the clean AC speech are predicted using an LSTM neural network (NN) from noisy BC and AC speech parameters. Then, the joint state and observation models about BC and AC speeches are established with the estimated parameters through the linear-predicted-based speech model. Finally, the AC speech is enhanced through Kalman filtering by fusing noisy AC and BC observations. Simulation results on the elevoc simultaneously recorded microphone/bone (ESMB) dataset and self-recorded dataset illustrate that the proposed method obtains good speech enhancement (SE) performance and generalization ability with lower computational complexity compared with other existing methods, improving the speech quality by 1 point in the perceptual evaluation of speech quality (PESQ) and 0.1 point in short-time objective intelligibility (STOI) under the 5-dB white noise condition. Real-world experiments further confirm the effectiveness of the proposed method.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"25 10","pages":"17631-17639"},"PeriodicalIF":4.3000,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bone and Air-Conducted Speech Fusion Based on LSTM Network and Kalman Filtering\",\"authors\":\"Zhenglong Liu;Zhe Chen;Dapeng Yu;Fuliang Yin\",\"doi\":\"10.1109/JSEN.2025.3554186\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Integrating bone-conducted (BC) and air-conducted (AC) microphones can synergistically enhance noise suppression and improve speech quality. For this purpose, a novel time-domain BC and AC speech fusion enhancement method based on the long short-term memory (LSTM) network and Kalman filtering is proposed. Specifically, the line spectral frequencies (LSFs) and the corresponding residual power of the clean AC speech are predicted using an LSTM neural network (NN) from noisy BC and AC speech parameters. Then, the joint state and observation models about BC and AC speeches are established with the estimated parameters through the linear-predicted-based speech model. Finally, the AC speech is enhanced through Kalman filtering by fusing noisy AC and BC observations. Simulation results on the elevoc simultaneously recorded microphone/bone (ESMB) dataset and self-recorded dataset illustrate that the proposed method obtains good speech enhancement (SE) performance and generalization ability with lower computational complexity compared with other existing methods, improving the speech quality by 1 point in the perceptual evaluation of speech quality (PESQ) and 0.1 point in short-time objective intelligibility (STOI) under the 5-dB white noise condition. Real-world experiments further confirm the effectiveness of the proposed method.\",\"PeriodicalId\":447,\"journal\":{\"name\":\"IEEE Sensors Journal\",\"volume\":\"25 10\",\"pages\":\"17631-17639\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-03-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Sensors Journal\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10945986/\",\"RegionNum\":2,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Journal","FirstCategoryId":"103","ListUrlMain":"https://ieeexplore.ieee.org/document/10945986/","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

骨传导（BC）和空气传导（AC）麦克风集成可以协同增强噪声抑制和改善语音质量。为此，提出了一种基于LSTM网络和卡尔曼滤波的时域BC和AC语音融合增强方法。具体来说，利用LSTM神经网络（NN）从噪声BC和交流语音参数中预测干净交流语音的线谱频率（lfs）和相应的剩余功率。然后，通过基于线性预测的语音模型，利用估计的参数建立BC和AC语音的联合状态模型和观测模型。最后，通过卡尔曼滤波，融合带噪声的交流和BC观测值，增强交流语音。仿真结果表明，与现有方法相比，该方法具有较好的语音增强（SE）性能和泛化能力，且计算复杂度较低，在5db白噪声条件下，语音质量感知评价（PESQ）和短时客观可理解度（STOI）分别提高了1分和0.1分。实际实验进一步验证了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Bone and Air-Conducted Speech Fusion Based on LSTM Network and Kalman Filtering

Integrating bone-conducted (BC) and air-conducted (AC) microphones can synergistically enhance noise suppression and improve speech quality. For this purpose, a novel time-domain BC and AC speech fusion enhancement method based on the long short-term memory (LSTM) network and Kalman filtering is proposed. Specifically, the line spectral frequencies (LSFs) and the corresponding residual power of the clean AC speech are predicted using an LSTM neural network (NN) from noisy BC and AC speech parameters. Then, the joint state and observation models about BC and AC speeches are established with the estimated parameters through the linear-predicted-based speech model. Finally, the AC speech is enhanced through Kalman filtering by fusing noisy AC and BC observations. Simulation results on the elevoc simultaneously recorded microphone/bone (ESMB) dataset and self-recorded dataset illustrate that the proposed method obtains good speech enhancement (SE) performance and generalization ability with lower computational complexity compared with other existing methods, improving the speech quality by 1 point in the perceptual evaluation of speech quality (PESQ) and 0.1 point in short-time objective intelligibility (STOI) under the 5-dB white noise condition. Real-world experiments further confirm the effectiveness of the proposed method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Sensors Journal 工程技术-工程：电子与电气

CiteScore

7.70

自引率

14.00%

发文量

2058

审稿时长

5.2 months

期刊介绍： The fields of interest of the IEEE Sensors Journal are the theory, design , fabrication, manufacturing and applications of devices for sensing and transducing physical, chemical and biological phenomena, with emphasis on the electronics and physics aspect of sensors and integrated sensors-actuators. IEEE Sensors Journal deals with the following: -Sensor Phenomenology, Modelling, and Evaluation -Sensor Materials, Processing, and Fabrication -Chemical and Gas Sensors -Microfluidics and Biosensors -Optical Sensors -Physical Sensors: Temperature, Mechanical, Magnetic, and others -Acoustic and Ultrasonic Sensors -Sensor Packaging -Sensor Networks -Sensor Applications -Sensor Systems: Signals, Processing, and Interfaces -Actuators and Sensor Power Systems -Sensor Signal Processing for high precision and stability (amplification, filtering, linearization, modulation/demodulation) and under harsh conditions (EMC, radiation, humidity, temperature); energy consumption/harvesting -Sensor Data Processing (soft computing with sensor data, e.g., pattern recognition, machine learning, evolutionary computation; sensor data fusion, processing of wave e.g., electromagnetic and acoustic; and non-wave, e.g., chemical, gravity, particle, thermal, radiative and non-radiative sensor data, detection, estimation and classification based on sensor data) -Sensors in Industrial Practice