{"title":"语音处理中几种距离度量的实验比较","authors":"L. Everson, W. Penzhorn","doi":"10.1109/COMSIG.1988.49293","DOIUrl":null,"url":null,"abstract":"A description is given of a number of different distance measures between speech segments commonly used in the analysis and recognition of speech. The measures considered include spectral slope, correlation coefficients, log likelihood ration, cepstral, weighted ceptstral, and modified distance measures. These metrics were tested on either the linear predictive coding (LPC) or the frequency spectrum depending on the type of measurement. Work reported elsewhere, was also considered and experimentally verified. The tests were performed on speech in a noisy background in Gaussian and in high-frequency noise. All the measures were compared using the same speech database. These evaluations show that by eliminating unwanted information in speech segments, the distance metric can be made more robust in noisy environments. It was found that threshold 1/ sigma -weighted distance metric, where sigma is the standard deviation of a given cepstral coefficient, is generally the best speech distance metric to use in most types of noisy environments. Some of the other metrics work better in isolated areas, but do not show the same high general recognition result.<<ETX>>","PeriodicalId":339020,"journal":{"name":"COMSIG 88@m_Southern African Conference on Communications and Signal Processing. Proceedings","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1988-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Experimental comparison on several distance measures for speech processing applications\",\"authors\":\"L. Everson, W. Penzhorn\",\"doi\":\"10.1109/COMSIG.1988.49293\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A description is given of a number of different distance measures between speech segments commonly used in the analysis and recognition of speech. The measures considered include spectral slope, correlation coefficients, log likelihood ration, cepstral, weighted ceptstral, and modified distance measures. These metrics were tested on either the linear predictive coding (LPC) or the frequency spectrum depending on the type of measurement. Work reported elsewhere, was also considered and experimentally verified. The tests were performed on speech in a noisy background in Gaussian and in high-frequency noise. All the measures were compared using the same speech database. These evaluations show that by eliminating unwanted information in speech segments, the distance metric can be made more robust in noisy environments. It was found that threshold 1/ sigma -weighted distance metric, where sigma is the standard deviation of a given cepstral coefficient, is generally the best speech distance metric to use in most types of noisy environments. Some of the other metrics work better in isolated areas, but do not show the same high general recognition result.<<ETX>>\",\"PeriodicalId\":339020,\"journal\":{\"name\":\"COMSIG 88@m_Southern African Conference on Communications and Signal Processing. Proceedings\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1988-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"COMSIG 88@m_Southern African Conference on Communications and Signal Processing. Proceedings\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMSIG.1988.49293\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"COMSIG 88@m_Southern African Conference on Communications and Signal Processing. Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMSIG.1988.49293","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Experimental comparison on several distance measures for speech processing applications
A description is given of a number of different distance measures between speech segments commonly used in the analysis and recognition of speech. The measures considered include spectral slope, correlation coefficients, log likelihood ration, cepstral, weighted ceptstral, and modified distance measures. These metrics were tested on either the linear predictive coding (LPC) or the frequency spectrum depending on the type of measurement. Work reported elsewhere, was also considered and experimentally verified. The tests were performed on speech in a noisy background in Gaussian and in high-frequency noise. All the measures were compared using the same speech database. These evaluations show that by eliminating unwanted information in speech segments, the distance metric can be made more robust in noisy environments. It was found that threshold 1/ sigma -weighted distance metric, where sigma is the standard deviation of a given cepstral coefficient, is generally the best speech distance metric to use in most types of noisy environments. Some of the other metrics work better in isolated areas, but do not show the same high general recognition result.<>