{"title":"寻找最佳的光谱分辨率在自动扬声器识别","authors":"H. Sayoud, S. Ouamour","doi":"10.1109/IEEEGCC.2006.5686247","DOIUrl":null,"url":null,"abstract":"In this research work we look for the optimal spectral resolution for speaker authentication in quiet and noisy environment, using the speech signal (microphonic and telephonic bandwidths). This problem is investigated according to several conditions. For this purpose, we investigated the effect of the spectral resolution in speaker identification performance. During this research work, we implemented a statistical approach based on second order statistical measures and using the normalised Mel-spectral energies (MFSC). In order to find the optimal spectral resolution, in microphonic and telephonic bandwidth, we tested several dimensions for the MFSC vector (Normalised Mel energies) ranging from 12 to 60 and several types of additive noise (white noise, car noise and racket noise) at several SNR ratios. Results show that the optimal spectral dimension depends on the experimental conditions. So, we noticed the importance of the high spectral resolution of 60 coefficients / 8 kHz for the [0-8 kHz] bandwidth and the resolution of 48 coefficients / 8 kHz for the [0.3-3.4 kHz] bandwidth (especially in noisy environment), whereas the actual works have always favoured resolutions less than 24 coefficients in such tasks. For example, we note an improvement of about 11% in the recognition score, since we increase the resolution from 24 to 48 MFSC for the telephonic bandwidth.","PeriodicalId":433452,"journal":{"name":"2006 IEEE GCC Conference (GCC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Looking for the best spectral resolution in automatic speaker recognition\",\"authors\":\"H. Sayoud, S. Ouamour\",\"doi\":\"10.1109/IEEEGCC.2006.5686247\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this research work we look for the optimal spectral resolution for speaker authentication in quiet and noisy environment, using the speech signal (microphonic and telephonic bandwidths). This problem is investigated according to several conditions. For this purpose, we investigated the effect of the spectral resolution in speaker identification performance. During this research work, we implemented a statistical approach based on second order statistical measures and using the normalised Mel-spectral energies (MFSC). In order to find the optimal spectral resolution, in microphonic and telephonic bandwidth, we tested several dimensions for the MFSC vector (Normalised Mel energies) ranging from 12 to 60 and several types of additive noise (white noise, car noise and racket noise) at several SNR ratios. Results show that the optimal spectral dimension depends on the experimental conditions. So, we noticed the importance of the high spectral resolution of 60 coefficients / 8 kHz for the [0-8 kHz] bandwidth and the resolution of 48 coefficients / 8 kHz for the [0.3-3.4 kHz] bandwidth (especially in noisy environment), whereas the actual works have always favoured resolutions less than 24 coefficients in such tasks. For example, we note an improvement of about 11% in the recognition score, since we increase the resolution from 24 to 48 MFSC for the telephonic bandwidth.\",\"PeriodicalId\":433452,\"journal\":{\"name\":\"2006 IEEE GCC Conference (GCC)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2006 IEEE GCC Conference (GCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IEEEGCC.2006.5686247\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 IEEE GCC Conference (GCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IEEEGCC.2006.5686247","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Looking for the best spectral resolution in automatic speaker recognition
In this research work we look for the optimal spectral resolution for speaker authentication in quiet and noisy environment, using the speech signal (microphonic and telephonic bandwidths). This problem is investigated according to several conditions. For this purpose, we investigated the effect of the spectral resolution in speaker identification performance. During this research work, we implemented a statistical approach based on second order statistical measures and using the normalised Mel-spectral energies (MFSC). In order to find the optimal spectral resolution, in microphonic and telephonic bandwidth, we tested several dimensions for the MFSC vector (Normalised Mel energies) ranging from 12 to 60 and several types of additive noise (white noise, car noise and racket noise) at several SNR ratios. Results show that the optimal spectral dimension depends on the experimental conditions. So, we noticed the importance of the high spectral resolution of 60 coefficients / 8 kHz for the [0-8 kHz] bandwidth and the resolution of 48 coefficients / 8 kHz for the [0.3-3.4 kHz] bandwidth (especially in noisy environment), whereas the actual works have always favoured resolutions less than 24 coefficients in such tasks. For example, we note an improvement of about 11% in the recognition score, since we increase the resolution from 24 to 48 MFSC for the telephonic bandwidth.