M. P. Actlin Jeeva, T. Nagarajan, P. Vijayalakshmi
{"title":"Formant filters-based multi-band speech enhancement algorithm for intelligibility improvement","authors":"M. P. Actlin Jeeva, T. Nagarajan, P. Vijayalakshmi","doi":"10.1109/NCC.2016.7561149","DOIUrl":null,"url":null,"abstract":"Speech enhancement algorithms in the past concentrated on improving the speech quality, however they need not necessarily improve intelligibility of the enhanced speech. The current work focuses on improving the quality as well as intelligibility of the well-known multi-band spectral subtraction algorithm. In this regard, to improve speech quality, a temporal-domain filtering-based approach is proposed to obtain sub-bands (ERB-based). To improve intelligibility, it is necessary to identify the type of distortion (attenuation or amplification distortion) that affects the intelligibility of enhanced speech. Therefore, an analysis is performed on the enhanced speech at the phoneme level using segmental-SNR and it is observed that in high SNR regions of the noisy speech (specifically in vowels, liquids, nasals), intelligibility is reduced due to amplification distortion. This may be due to the high spectral resolution of the temporal-domain ERB-based filters. Hence, to improve intelligibility, a set of formant specific filters are proposed based on the formant analysis carried out over vowels, liquids and nasals. The performance of the proposed multi-band spectral subtraction algorithm is evaluated for its quality and intelligibility, using subjective (MOS) and objective (PESQ and CSII) measures, for the speech affected by white, car and babble noise at -5 to 15 dB SNR levels. It is observed that the proposed method improves speech quality and intelligibility by around 0.1-0.5 in terms of PESQ and 2-10% in terms of CSII over conventional multi-band spectral subtraction method.","PeriodicalId":279637,"journal":{"name":"2016 Twenty Second National Conference on Communication (NCC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Twenty Second National Conference on Communication (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC.2016.7561149","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Speech enhancement algorithms in the past concentrated on improving the speech quality, however they need not necessarily improve intelligibility of the enhanced speech. The current work focuses on improving the quality as well as intelligibility of the well-known multi-band spectral subtraction algorithm. In this regard, to improve speech quality, a temporal-domain filtering-based approach is proposed to obtain sub-bands (ERB-based). To improve intelligibility, it is necessary to identify the type of distortion (attenuation or amplification distortion) that affects the intelligibility of enhanced speech. Therefore, an analysis is performed on the enhanced speech at the phoneme level using segmental-SNR and it is observed that in high SNR regions of the noisy speech (specifically in vowels, liquids, nasals), intelligibility is reduced due to amplification distortion. This may be due to the high spectral resolution of the temporal-domain ERB-based filters. Hence, to improve intelligibility, a set of formant specific filters are proposed based on the formant analysis carried out over vowels, liquids and nasals. The performance of the proposed multi-band spectral subtraction algorithm is evaluated for its quality and intelligibility, using subjective (MOS) and objective (PESQ and CSII) measures, for the speech affected by white, car and babble noise at -5 to 15 dB SNR levels. It is observed that the proposed method improves speech quality and intelligibility by around 0.1-0.5 in terms of PESQ and 2-10% in terms of CSII over conventional multi-band spectral subtraction method.