{"title":"基于LSTM网络和卡尔曼滤波的骨传导和空气传导语音融合","authors":"Zhenglong Liu;Zhe Chen;Dapeng Yu;Fuliang Yin","doi":"10.1109/JSEN.2025.3554186","DOIUrl":null,"url":null,"abstract":"Integrating bone-conducted (BC) and air-conducted (AC) microphones can synergistically enhance noise suppression and improve speech quality. For this purpose, a novel time-domain BC and AC speech fusion enhancement method based on the long short-term memory (LSTM) network and Kalman filtering is proposed. Specifically, the line spectral frequencies (LSFs) and the corresponding residual power of the clean AC speech are predicted using an LSTM neural network (NN) from noisy BC and AC speech parameters. Then, the joint state and observation models about BC and AC speeches are established with the estimated parameters through the linear-predicted-based speech model. Finally, the AC speech is enhanced through Kalman filtering by fusing noisy AC and BC observations. Simulation results on the elevoc simultaneously recorded microphone/bone (ESMB) dataset and self-recorded dataset illustrate that the proposed method obtains good speech enhancement (SE) performance and generalization ability with lower computational complexity compared with other existing methods, improving the speech quality by 1 point in the perceptual evaluation of speech quality (PESQ) and 0.1 point in short-time objective intelligibility (STOI) under the 5-dB white noise condition. Real-world experiments further confirm the effectiveness of the proposed method.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"25 10","pages":"17631-17639"},"PeriodicalIF":4.3000,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bone and Air-Conducted Speech Fusion Based on LSTM Network and Kalman Filtering\",\"authors\":\"Zhenglong Liu;Zhe Chen;Dapeng Yu;Fuliang Yin\",\"doi\":\"10.1109/JSEN.2025.3554186\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Integrating bone-conducted (BC) and air-conducted (AC) microphones can synergistically enhance noise suppression and improve speech quality. For this purpose, a novel time-domain BC and AC speech fusion enhancement method based on the long short-term memory (LSTM) network and Kalman filtering is proposed. Specifically, the line spectral frequencies (LSFs) and the corresponding residual power of the clean AC speech are predicted using an LSTM neural network (NN) from noisy BC and AC speech parameters. Then, the joint state and observation models about BC and AC speeches are established with the estimated parameters through the linear-predicted-based speech model. Finally, the AC speech is enhanced through Kalman filtering by fusing noisy AC and BC observations. Simulation results on the elevoc simultaneously recorded microphone/bone (ESMB) dataset and self-recorded dataset illustrate that the proposed method obtains good speech enhancement (SE) performance and generalization ability with lower computational complexity compared with other existing methods, improving the speech quality by 1 point in the perceptual evaluation of speech quality (PESQ) and 0.1 point in short-time objective intelligibility (STOI) under the 5-dB white noise condition. Real-world experiments further confirm the effectiveness of the proposed method.\",\"PeriodicalId\":447,\"journal\":{\"name\":\"IEEE Sensors Journal\",\"volume\":\"25 10\",\"pages\":\"17631-17639\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-03-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Sensors Journal\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10945986/\",\"RegionNum\":2,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Journal","FirstCategoryId":"103","ListUrlMain":"https://ieeexplore.ieee.org/document/10945986/","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Bone and Air-Conducted Speech Fusion Based on LSTM Network and Kalman Filtering
Integrating bone-conducted (BC) and air-conducted (AC) microphones can synergistically enhance noise suppression and improve speech quality. For this purpose, a novel time-domain BC and AC speech fusion enhancement method based on the long short-term memory (LSTM) network and Kalman filtering is proposed. Specifically, the line spectral frequencies (LSFs) and the corresponding residual power of the clean AC speech are predicted using an LSTM neural network (NN) from noisy BC and AC speech parameters. Then, the joint state and observation models about BC and AC speeches are established with the estimated parameters through the linear-predicted-based speech model. Finally, the AC speech is enhanced through Kalman filtering by fusing noisy AC and BC observations. Simulation results on the elevoc simultaneously recorded microphone/bone (ESMB) dataset and self-recorded dataset illustrate that the proposed method obtains good speech enhancement (SE) performance and generalization ability with lower computational complexity compared with other existing methods, improving the speech quality by 1 point in the perceptual evaluation of speech quality (PESQ) and 0.1 point in short-time objective intelligibility (STOI) under the 5-dB white noise condition. Real-world experiments further confirm the effectiveness of the proposed method.
期刊介绍:
The fields of interest of the IEEE Sensors Journal are the theory, design , fabrication, manufacturing and applications of devices for sensing and transducing physical, chemical and biological phenomena, with emphasis on the electronics and physics aspect of sensors and integrated sensors-actuators. IEEE Sensors Journal deals with the following:
-Sensor Phenomenology, Modelling, and Evaluation
-Sensor Materials, Processing, and Fabrication
-Chemical and Gas Sensors
-Microfluidics and Biosensors
-Optical Sensors
-Physical Sensors: Temperature, Mechanical, Magnetic, and others
-Acoustic and Ultrasonic Sensors
-Sensor Packaging
-Sensor Networks
-Sensor Applications
-Sensor Systems: Signals, Processing, and Interfaces
-Actuators and Sensor Power Systems
-Sensor Signal Processing for high precision and stability (amplification, filtering, linearization, modulation/demodulation) and under harsh conditions (EMC, radiation, humidity, temperature); energy consumption/harvesting
-Sensor Data Processing (soft computing with sensor data, e.g., pattern recognition, machine learning, evolutionary computation; sensor data fusion, processing of wave e.g., electromagnetic and acoustic; and non-wave, e.g., chemical, gravity, particle, thermal, radiative and non-radiative sensor data, detection, estimation and classification based on sensor data)
-Sensors in Industrial Practice