An Acoustic-Phonetic Diagnostic Tool for the Evaluation of Auditory Models

Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics Pub Date : 1900-01-01 DOI:10.1109/ASPAA.1991.634096

O. Ghitza

{"title":"An Acoustic-Phonetic Diagnostic Tool for the Evaluation of Auditory Models","authors":"O. Ghitza","doi":"10.1109/ASPAA.1991.634096","DOIUrl":null,"url":null,"abstract":"A long standing question that arises when studying a particular auditory model is how to evaluate its performance. More precisely, it is of interest to evaluate to what extent the model-representation can describe the actual human internal representation. In this study we address this question in the context of speech perception. That is, given a speech representation based on the auditory system, to what extent can it preserve phonetic information that is perceptually relevant? To answer this question, a diagnostic system has been developed that simulates the psychophysical procedure used in the standard Diagnostic-Rhyme Test (DRT, Voiers, 1983). In the psychophysical procedure the subject has all the cognitive information needed for the discrimination task a priori. Hence, errors in discrimination are due mainly to inaccuracies in the auditory representation of the stimulus. In the simulation, the human observer is replaced by an array of recognizers, one for each pair of words in tlhe DRT database. An effort has been made to keep the errors due to the \"observer\" to a minimum, so that the overall detected errors are due mainly to inaccuracies in the auditory model representation. This effort includes a careful design of the recognizer (i.e, using an HMM with time-varying states, Ghitza and Sondhi, 1990) and the use of a speaker-dependent DRT simulation. To demonstrate the power of the suggested evaluation method, we considered the behavior of two speech analysis methods, the Fourier power spectrum and a representation based on the auditory syslem (the EIH model, Ghitza, 1988), in, quiet and in a noisy environment. The results were compared with psychophysical results for the same database. The results show that the overall number of errors made by the machine (the Fourier power spectrum or the EIK) are far greater than the overall number of errors made by a human, at all noise llevels that were tested. Further, the proposed evaluation method offers a detailed picture of the error distribution among the selected phonetic features. It shows that the errors made by the human listener sue distributed in a different way compared to the errors made by the machines, and that the distributions of errors made by the two analyzers are also quiet different from each other.","PeriodicalId":146017,"journal":{"name":"Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASPAA.1991.634096","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

A long standing question that arises when studying a particular auditory model is how to evaluate its performance. More precisely, it is of interest to evaluate to what extent the model-representation can describe the actual human internal representation. In this study we address this question in the context of speech perception. That is, given a speech representation based on the auditory system, to what extent can it preserve phonetic information that is perceptually relevant? To answer this question, a diagnostic system has been developed that simulates the psychophysical procedure used in the standard Diagnostic-Rhyme Test (DRT, Voiers, 1983). In the psychophysical procedure the subject has all the cognitive information needed for the discrimination task a priori. Hence, errors in discrimination are due mainly to inaccuracies in the auditory representation of the stimulus. In the simulation, the human observer is replaced by an array of recognizers, one for each pair of words in tlhe DRT database. An effort has been made to keep the errors due to the "observer" to a minimum, so that the overall detected errors are due mainly to inaccuracies in the auditory model representation. This effort includes a careful design of the recognizer (i.e, using an HMM with time-varying states, Ghitza and Sondhi, 1990) and the use of a speaker-dependent DRT simulation. To demonstrate the power of the suggested evaluation method, we considered the behavior of two speech analysis methods, the Fourier power spectrum and a representation based on the auditory syslem (the EIH model, Ghitza, 1988), in, quiet and in a noisy environment. The results were compared with psychophysical results for the same database. The results show that the overall number of errors made by the machine (the Fourier power spectrum or the EIK) are far greater than the overall number of errors made by a human, at all noise llevels that were tested. Further, the proposed evaluation method offers a detailed picture of the error distribution among the selected phonetic features. It shows that the errors made by the human listener sue distributed in a different way compared to the errors made by the machines, and that the distributions of errors made by the two analyzers are also quiet different from each other.

查看原文本刊更多论文

一种评价听觉模型的声学-语音诊断工具

在研究一个特定的听觉模型时，一个长期存在的问题是如何评估它的性能。更准确地说，评估模型表征在多大程度上可以描述实际的人类内部表征是有意义的。在这项研究中，我们在语言感知的背景下解决了这个问题。也就是说，给定一个基于听觉系统的语音表示，它能在多大程度上保留与感知相关的语音信息?为了回答这个问题，一个诊断系统已经被开发出来，它模拟了标准诊断韵律测试(DRT, Voiers, 1983)中使用的心理物理程序。在心理物理过程中，被试先验地拥有辨别任务所需的所有认知信息。因此，辨别错误主要是由于刺激的听觉表征不准确。在模拟中，人类观察者被一组识别器所取代，每一对识别器对应DRT数据库中的单词。我们已经努力将由于“观察者”造成的错误保持在最低限度，因此总体检测到的错误主要是由于听觉模型表示中的不准确性。这项工作包括对识别器的精心设计(即，使用具有时变状态的HMM, Ghitza和Sondhi, 1990)和使用依赖于说话人的DRT模拟。为了证明所建议的评估方法的力量，我们考虑了两种语音分析方法的行为，傅里叶功率谱和基于听觉系统的表示(EIH模型，Ghitza, 1988)，在安静和嘈杂的环境中。将结果与同一数据库的心理物理结果进行比较。结果表明，在测试的所有噪声水平下，机器产生的总误差(傅里叶功率谱或EIK)远远大于人类产生的总误差。此外，所提出的评价方法提供了在所选语音特征之间的误差分布的详细图像。结果表明，与机器产生的误差相比，人类听众产生的误差分布方式不同，两种分析仪产生的误差分布也有很大不同。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics

自引率

0.00%

发文量