{"title":"Estimation of Japanese DRT intelligibility using Articulation Index Band Correlations","authors":"K. Kondo","doi":"10.1109/APSIPA.2014.7041516","DOIUrl":null,"url":null,"abstract":"We proposed and evaluated an estimation method for the forced selection Japanese Diagnostic Rhyme Test (DRT). The proposed measure takes into account the forced selection manner of the DRT from a pair of rhyming words. The objective distance measure used here was based on the Articulation index Band Correlation (ABC), which showed favorable results for the English Modified Rhyme Test (MRT). The correlation of time-frequency patterns between the test word and the template word speech of the two words in the candidate word pair was calculated. The word with the higher correlation was decided to be the likely candidate word. The time-frequency (T-F) pattern was calculated in the Articulation Index (AI) bands, and the correlation was calculated between the corresponding bands of the test and candidate word sample. The candidate word with more AI bands showing higher correlation values was finally chosen. The ratio of bands with higher correlation with the candidate word vs. the total number of bands is calculated to quantify how well the test word matches the candidate word in the word pair. We estimated a logistic mapping function from this ratio to intelligibility scores using speech mixed with known noise. The mapping functions were then used to estimate the intelligibility of speech mixed with unknown noise. This estimation was compared to another measure that we previously have evaluated, the frequency-weighed segmental SNR, and was proven to be more accurate, with the correlation between estimated and estimated intelligibility over 0.93, and the root mean square below 0.15. Thus, it should be possible to \"screen\" the intelligibility in many of the noise conditions to be tested, and cut down on the scale of the subjective test needed.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIPA.2014.7041516","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
We proposed and evaluated an estimation method for the forced selection Japanese Diagnostic Rhyme Test (DRT). The proposed measure takes into account the forced selection manner of the DRT from a pair of rhyming words. The objective distance measure used here was based on the Articulation index Band Correlation (ABC), which showed favorable results for the English Modified Rhyme Test (MRT). The correlation of time-frequency patterns between the test word and the template word speech of the two words in the candidate word pair was calculated. The word with the higher correlation was decided to be the likely candidate word. The time-frequency (T-F) pattern was calculated in the Articulation Index (AI) bands, and the correlation was calculated between the corresponding bands of the test and candidate word sample. The candidate word with more AI bands showing higher correlation values was finally chosen. The ratio of bands with higher correlation with the candidate word vs. the total number of bands is calculated to quantify how well the test word matches the candidate word in the word pair. We estimated a logistic mapping function from this ratio to intelligibility scores using speech mixed with known noise. The mapping functions were then used to estimate the intelligibility of speech mixed with unknown noise. This estimation was compared to another measure that we previously have evaluated, the frequency-weighed segmental SNR, and was proven to be more accurate, with the correlation between estimated and estimated intelligibility over 0.93, and the root mean square below 0.15. Thus, it should be possible to "screen" the intelligibility in many of the noise conditions to be tested, and cut down on the scale of the subjective test needed.