{"title":"基于时频域压缩感知的自适应字典选择语音增强算法","authors":"Naser Sharafi , Salman Karimi , Samira Mavaddati","doi":"10.1016/j.csl.2025.101887","DOIUrl":null,"url":null,"abstract":"<div><div>Speech signal enhancement and noise reduction play a vital role in applications such as telecommunications, audio broadcasting, and military systems. This paper proposes a novel speech enhancement method based on compressive sensing principles in the time-frequency domain, incorporating sparse representation and dictionary learning techniques. The proposed method constructs an optimal dictionary of atoms that can sparsely represent clean speech signals. A key component of the framework is a noise-aware block, which leverages multiple pre-trained noise dictionaries along with the spectral features of noisy speech to build a composite noise model. It isolates noise-only segments, computes their sparse coefficients, and evaluates energy contributions across all candidate dictionaries. The dictionary with the highest energy is then selected as the dominant noise type. The algorithm dynamically adapts to handle unseen noise types by selecting the most similar noise structure present in the dictionary pool, offering a degree of generalization. The proposed system is evaluated under three clearly defined scenarios: (i) using a baseline sparse representation model, (ii) incorporating dictionary learning with a fixed noise model, and (iii) employing the full adaptive noise-aware framework. The method demonstrates strong performance against nine types of noise (non-stationary, periodic, and static) across a wide SNR range (-5 dB to +20 dB). On average, it yields 16.71 % improvement in PESQ and 3.39 % in STOI compared to existing techniques. Simulation results confirm the superiority of the proposed approach in both noise suppression and speech intelligibility, highlighting its potential as a robust tool for speech enhancement in real-world noisy environments.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"96 ","pages":"Article 101887"},"PeriodicalIF":3.4000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Advanced noise-aware speech enhancement algorithm via adaptive dictionary selection based on compressed sensing in the time-frequency domain\",\"authors\":\"Naser Sharafi , Salman Karimi , Samira Mavaddati\",\"doi\":\"10.1016/j.csl.2025.101887\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Speech signal enhancement and noise reduction play a vital role in applications such as telecommunications, audio broadcasting, and military systems. This paper proposes a novel speech enhancement method based on compressive sensing principles in the time-frequency domain, incorporating sparse representation and dictionary learning techniques. The proposed method constructs an optimal dictionary of atoms that can sparsely represent clean speech signals. A key component of the framework is a noise-aware block, which leverages multiple pre-trained noise dictionaries along with the spectral features of noisy speech to build a composite noise model. It isolates noise-only segments, computes their sparse coefficients, and evaluates energy contributions across all candidate dictionaries. The dictionary with the highest energy is then selected as the dominant noise type. The algorithm dynamically adapts to handle unseen noise types by selecting the most similar noise structure present in the dictionary pool, offering a degree of generalization. The proposed system is evaluated under three clearly defined scenarios: (i) using a baseline sparse representation model, (ii) incorporating dictionary learning with a fixed noise model, and (iii) employing the full adaptive noise-aware framework. The method demonstrates strong performance against nine types of noise (non-stationary, periodic, and static) across a wide SNR range (-5 dB to +20 dB). On average, it yields 16.71 % improvement in PESQ and 3.39 % in STOI compared to existing techniques. Simulation results confirm the superiority of the proposed approach in both noise suppression and speech intelligibility, highlighting its potential as a robust tool for speech enhancement in real-world noisy environments.</div></div>\",\"PeriodicalId\":50638,\"journal\":{\"name\":\"Computer Speech and Language\",\"volume\":\"96 \",\"pages\":\"Article 101887\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Speech and Language\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0885230825001123\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230825001123","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Advanced noise-aware speech enhancement algorithm via adaptive dictionary selection based on compressed sensing in the time-frequency domain
Speech signal enhancement and noise reduction play a vital role in applications such as telecommunications, audio broadcasting, and military systems. This paper proposes a novel speech enhancement method based on compressive sensing principles in the time-frequency domain, incorporating sparse representation and dictionary learning techniques. The proposed method constructs an optimal dictionary of atoms that can sparsely represent clean speech signals. A key component of the framework is a noise-aware block, which leverages multiple pre-trained noise dictionaries along with the spectral features of noisy speech to build a composite noise model. It isolates noise-only segments, computes their sparse coefficients, and evaluates energy contributions across all candidate dictionaries. The dictionary with the highest energy is then selected as the dominant noise type. The algorithm dynamically adapts to handle unseen noise types by selecting the most similar noise structure present in the dictionary pool, offering a degree of generalization. The proposed system is evaluated under three clearly defined scenarios: (i) using a baseline sparse representation model, (ii) incorporating dictionary learning with a fixed noise model, and (iii) employing the full adaptive noise-aware framework. The method demonstrates strong performance against nine types of noise (non-stationary, periodic, and static) across a wide SNR range (-5 dB to +20 dB). On average, it yields 16.71 % improvement in PESQ and 3.39 % in STOI compared to existing techniques. Simulation results confirm the superiority of the proposed approach in both noise suppression and speech intelligibility, highlighting its potential as a robust tool for speech enhancement in real-world noisy environments.
期刊介绍:
Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language.
The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.