{"title":"一种评价听觉模型的声学-语音诊断工具","authors":"O. Ghitza","doi":"10.1109/ASPAA.1991.634096","DOIUrl":null,"url":null,"abstract":"A long standing question that arises when studying a particular auditory model is how to evaluate its performance. More precisely, it is of interest to evaluate to what extent the model-representation can describe the actual human internal representation. In this study we address this question in the context of speech perception. That is, given a speech representation based on the auditory system, to what extent can it preserve phonetic information that is perceptually relevant? To answer this question, a diagnostic system has been developed that simulates the psychophysical procedure used in the standard Diagnostic-Rhyme Test (DRT, Voiers, 1983). In the psychophysical procedure the subject has all the cognitive information needed for the discrimination task a priori. Hence, errors in discrimination are due mainly to inaccuracies in the auditory representation of the stimulus. In the simulation, the human observer is replaced by an array of recognizers, one for each pair of words in tlhe DRT database. An effort has been made to keep the errors due to the \"observer\" to a minimum, so that the overall detected errors are due mainly to inaccuracies in the auditory model representation. This effort includes a careful design of the recognizer (i.e, using an HMM with time-varying states, Ghitza and Sondhi, 1990) and the use of a speaker-dependent DRT simulation. To demonstrate the power of the suggested evaluation method, we considered the behavior of two speech analysis methods, the Fourier power spectrum and a representation based on the auditory syslem (the EIH model, Ghitza, 1988), in, quiet and in a noisy environment. The results were compared with psychophysical results for the same database. The results show that the overall number of errors made by the machine (the Fourier power spectrum or the EIK) are far greater than the overall number of errors made by a human, at all noise llevels that were tested. Further, the proposed evaluation method offers a detailed picture of the error distribution among the selected phonetic features. It shows that the errors made by the human listener sue distributed in a different way compared to the errors made by the machines, and that the distributions of errors made by the two analyzers are also quiet different from each other.","PeriodicalId":146017,"journal":{"name":"Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Acoustic-Phonetic Diagnostic Tool for the Evaluation of Auditory Models\",\"authors\":\"O. Ghitza\",\"doi\":\"10.1109/ASPAA.1991.634096\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A long standing question that arises when studying a particular auditory model is how to evaluate its performance. More precisely, it is of interest to evaluate to what extent the model-representation can describe the actual human internal representation. In this study we address this question in the context of speech perception. That is, given a speech representation based on the auditory system, to what extent can it preserve phonetic information that is perceptually relevant? To answer this question, a diagnostic system has been developed that simulates the psychophysical procedure used in the standard Diagnostic-Rhyme Test (DRT, Voiers, 1983). In the psychophysical procedure the subject has all the cognitive information needed for the discrimination task a priori. Hence, errors in discrimination are due mainly to inaccuracies in the auditory representation of the stimulus. In the simulation, the human observer is replaced by an array of recognizers, one for each pair of words in tlhe DRT database. An effort has been made to keep the errors due to the \\\"observer\\\" to a minimum, so that the overall detected errors are due mainly to inaccuracies in the auditory model representation. This effort includes a careful design of the recognizer (i.e, using an HMM with time-varying states, Ghitza and Sondhi, 1990) and the use of a speaker-dependent DRT simulation. To demonstrate the power of the suggested evaluation method, we considered the behavior of two speech analysis methods, the Fourier power spectrum and a representation based on the auditory syslem (the EIH model, Ghitza, 1988), in, quiet and in a noisy environment. The results were compared with psychophysical results for the same database. The results show that the overall number of errors made by the machine (the Fourier power spectrum or the EIK) are far greater than the overall number of errors made by a human, at all noise llevels that were tested. Further, the proposed evaluation method offers a detailed picture of the error distribution among the selected phonetic features. It shows that the errors made by the human listener sue distributed in a different way compared to the errors made by the machines, and that the distributions of errors made by the two analyzers are also quiet different from each other.\",\"PeriodicalId\":146017,\"journal\":{\"name\":\"Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics\",\"volume\":\"61 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASPAA.1991.634096\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASPAA.1991.634096","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Acoustic-Phonetic Diagnostic Tool for the Evaluation of Auditory Models
A long standing question that arises when studying a particular auditory model is how to evaluate its performance. More precisely, it is of interest to evaluate to what extent the model-representation can describe the actual human internal representation. In this study we address this question in the context of speech perception. That is, given a speech representation based on the auditory system, to what extent can it preserve phonetic information that is perceptually relevant? To answer this question, a diagnostic system has been developed that simulates the psychophysical procedure used in the standard Diagnostic-Rhyme Test (DRT, Voiers, 1983). In the psychophysical procedure the subject has all the cognitive information needed for the discrimination task a priori. Hence, errors in discrimination are due mainly to inaccuracies in the auditory representation of the stimulus. In the simulation, the human observer is replaced by an array of recognizers, one for each pair of words in tlhe DRT database. An effort has been made to keep the errors due to the "observer" to a minimum, so that the overall detected errors are due mainly to inaccuracies in the auditory model representation. This effort includes a careful design of the recognizer (i.e, using an HMM with time-varying states, Ghitza and Sondhi, 1990) and the use of a speaker-dependent DRT simulation. To demonstrate the power of the suggested evaluation method, we considered the behavior of two speech analysis methods, the Fourier power spectrum and a representation based on the auditory syslem (the EIH model, Ghitza, 1988), in, quiet and in a noisy environment. The results were compared with psychophysical results for the same database. The results show that the overall number of errors made by the machine (the Fourier power spectrum or the EIK) are far greater than the overall number of errors made by a human, at all noise llevels that were tested. Further, the proposed evaluation method offers a detailed picture of the error distribution among the selected phonetic features. It shows that the errors made by the human listener sue distributed in a different way compared to the errors made by the machines, and that the distributions of errors made by the two analyzers are also quiet different from each other.