{"title":"改进了用于实时语音到嘴唇转换的最小转换轨迹误差训练","authors":"Wei Han, Lijuan Wang, F. Soong, Bo Yuan","doi":"10.1109/ICASSP.2012.6288921","DOIUrl":null,"url":null,"abstract":"Gaussian mixture model (GMM) based speech-to-lips conversion often operates in two alternative ways: batch conversion and sliding window-based conversion for real-time processing. Previously, Minimum Converted Trajectory Error (MCTE) training has been proposed to improve the performance of batch conversion. In this paper, we extend previous work and propose a new training criteria, MCTE for Real-time conversion (R-MCTE), to explicitly optimize the quality of sliding window-based conversion. In R-MCTE, we use the probabilistic descent method to refine model parameters by minimizing the error on real-time converted visual trajectories over training data. Objective evaluations on the LIPS 2008 Visual Speech Synthesis Challenge data set shows that the proposed method achieves both good lip animation performance and low delay in real-time conversion.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Improved minimum converted trajectory error training for real-time speech-to-lips conversion\",\"authors\":\"Wei Han, Lijuan Wang, F. Soong, Bo Yuan\",\"doi\":\"10.1109/ICASSP.2012.6288921\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Gaussian mixture model (GMM) based speech-to-lips conversion often operates in two alternative ways: batch conversion and sliding window-based conversion for real-time processing. Previously, Minimum Converted Trajectory Error (MCTE) training has been proposed to improve the performance of batch conversion. In this paper, we extend previous work and propose a new training criteria, MCTE for Real-time conversion (R-MCTE), to explicitly optimize the quality of sliding window-based conversion. In R-MCTE, we use the probabilistic descent method to refine model parameters by minimizing the error on real-time converted visual trajectories over training data. Objective evaluations on the LIPS 2008 Visual Speech Synthesis Challenge data set shows that the proposed method achieves both good lip animation performance and low delay in real-time conversion.\",\"PeriodicalId\":6443,\"journal\":{\"name\":\"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-03-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2012.6288921\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2012.6288921","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improved minimum converted trajectory error training for real-time speech-to-lips conversion
Gaussian mixture model (GMM) based speech-to-lips conversion often operates in two alternative ways: batch conversion and sliding window-based conversion for real-time processing. Previously, Minimum Converted Trajectory Error (MCTE) training has been proposed to improve the performance of batch conversion. In this paper, we extend previous work and propose a new training criteria, MCTE for Real-time conversion (R-MCTE), to explicitly optimize the quality of sliding window-based conversion. In R-MCTE, we use the probabilistic descent method to refine model parameters by minimizing the error on real-time converted visual trajectories over training data. Objective evaluations on the LIPS 2008 Visual Speech Synthesis Challenge data set shows that the proposed method achieves both good lip animation performance and low delay in real-time conversion.