{"title":"联合特征和模型训练最小检测误差应用于语音子词检测","authors":"M. H. Johnsen, Alfonso M. Canterla","doi":"10.1109/MLSP.2012.6349729","DOIUrl":null,"url":null,"abstract":"This paper presents methods and results for joint optimization of the feature extraction and the model parameters of a detector. We further define a discriminative training criterion called Minimum Detection Error (MDE). The criterion can optimize the F-score or any other detection performance metric. The methods are used to design detectors of subwords in continuous speech, i.e. to spot phones and articulatory features. For each subword detector the MFCC filterbank matrix and the Gaussian means in the HMM models are jointly optimized. For experiments on TIMIT, the optimized detectors clearly outperform the baseline detectors and also our previous MCE based detectors. The results indicate that the same performance metric should be used for training and test and that accuracy outperforms F-score with respect to relative improvement. Furter, the optimized filterbanks usually reflect typical acoustic properties of the corresponding detection classes.","PeriodicalId":262601,"journal":{"name":"2012 IEEE International Workshop on Machine Learning for Signal Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Joint feature and model training for minimum detection errors applied to speech subword detection\",\"authors\":\"M. H. Johnsen, Alfonso M. Canterla\",\"doi\":\"10.1109/MLSP.2012.6349729\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents methods and results for joint optimization of the feature extraction and the model parameters of a detector. We further define a discriminative training criterion called Minimum Detection Error (MDE). The criterion can optimize the F-score or any other detection performance metric. The methods are used to design detectors of subwords in continuous speech, i.e. to spot phones and articulatory features. For each subword detector the MFCC filterbank matrix and the Gaussian means in the HMM models are jointly optimized. For experiments on TIMIT, the optimized detectors clearly outperform the baseline detectors and also our previous MCE based detectors. The results indicate that the same performance metric should be used for training and test and that accuracy outperforms F-score with respect to relative improvement. Furter, the optimized filterbanks usually reflect typical acoustic properties of the corresponding detection classes.\",\"PeriodicalId\":262601,\"journal\":{\"name\":\"2012 IEEE International Workshop on Machine Learning for Signal Processing\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE International Workshop on Machine Learning for Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MLSP.2012.6349729\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE International Workshop on Machine Learning for Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MLSP.2012.6349729","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Joint feature and model training for minimum detection errors applied to speech subword detection
This paper presents methods and results for joint optimization of the feature extraction and the model parameters of a detector. We further define a discriminative training criterion called Minimum Detection Error (MDE). The criterion can optimize the F-score or any other detection performance metric. The methods are used to design detectors of subwords in continuous speech, i.e. to spot phones and articulatory features. For each subword detector the MFCC filterbank matrix and the Gaussian means in the HMM models are jointly optimized. For experiments on TIMIT, the optimized detectors clearly outperform the baseline detectors and also our previous MCE based detectors. The results indicate that the same performance metric should be used for training and test and that accuracy outperforms F-score with respect to relative improvement. Furter, the optimized filterbanks usually reflect typical acoustic properties of the corresponding detection classes.