{"title":"使用自适应GMM估计的自动语音识别增强","authors":"Kemouche Abdennour, N. Aouf","doi":"10.1049/IC.2015.0117","DOIUrl":null,"url":null,"abstract":"In this paper, we present an automatic speech recognition system based on an adaptive Gaussian mixture technique dealing with audio signal modality. To perform robust density estimation after speech feature extraction stage, an adaptive mixture estimation method is used based on optimal minimization of the integral square distance between the true density that represents the speech features and the approximated mixture. This estimation is relatively difficult because of the complex representation of the density and the issues with Expectation-Maximization (EM) algorithm classically used for these approximations. The technique we are proposing in this work not only shows its performance through the experimental results of this paper but also provides in the future a natural and efficient way of including bimodality (audio and video) into our robust automatic speech recognition program of study.","PeriodicalId":215265,"journal":{"name":"International Conferences on Imaging for Crime Detection and Prevention","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automatic speech recognition enhancement using adaptive GMM estimation\",\"authors\":\"Kemouche Abdennour, N. Aouf\",\"doi\":\"10.1049/IC.2015.0117\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present an automatic speech recognition system based on an adaptive Gaussian mixture technique dealing with audio signal modality. To perform robust density estimation after speech feature extraction stage, an adaptive mixture estimation method is used based on optimal minimization of the integral square distance between the true density that represents the speech features and the approximated mixture. This estimation is relatively difficult because of the complex representation of the density and the issues with Expectation-Maximization (EM) algorithm classically used for these approximations. The technique we are proposing in this work not only shows its performance through the experimental results of this paper but also provides in the future a natural and efficient way of including bimodality (audio and video) into our robust automatic speech recognition program of study.\",\"PeriodicalId\":215265,\"journal\":{\"name\":\"International Conferences on Imaging for Crime Detection and Prevention\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conferences on Imaging for Crime Detection and Prevention\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1049/IC.2015.0117\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conferences on Imaging for Crime Detection and Prevention","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1049/IC.2015.0117","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automatic speech recognition enhancement using adaptive GMM estimation
In this paper, we present an automatic speech recognition system based on an adaptive Gaussian mixture technique dealing with audio signal modality. To perform robust density estimation after speech feature extraction stage, an adaptive mixture estimation method is used based on optimal minimization of the integral square distance between the true density that represents the speech features and the approximated mixture. This estimation is relatively difficult because of the complex representation of the density and the issues with Expectation-Maximization (EM) algorithm classically used for these approximations. The technique we are proposing in this work not only shows its performance through the experimental results of this paper but also provides in the future a natural and efficient way of including bimodality (audio and video) into our robust automatic speech recognition program of study.