Masato Onishi, Toru Takahashi, T. Irino, Hideki Kawahara
{"title":"语音自动变形中基于元音的频率对齐函数设计和基于识别的时间对齐","authors":"Masato Onishi, Toru Takahashi, T. Irino, Hideki Kawahara","doi":"10.1109/SLT.2008.4777831","DOIUrl":null,"url":null,"abstract":"New design procedures of time-frequency alignment for automatic speech morphing are proposed. The frequency alignment function at a specific frame is represented as a weighted average of vowel alignment functions based on similarity to each vowel. Julian, an open source speech recognition system, was used to design a time alignment function. Objective and subjective tests were conducted to evaluate the proposed method, and test results indicated that the proposed method yields comparable naturalness to the manually morphed samples in terms of time alignment. The results also illustrated that the proposed frequency alignment provides significantly better naturalness than morphed samples without frequency alignment.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"402 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Vowel-based frequency alignment function design and recognition-based time alignment for automatic speech morphing\",\"authors\":\"Masato Onishi, Toru Takahashi, T. Irino, Hideki Kawahara\",\"doi\":\"10.1109/SLT.2008.4777831\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"New design procedures of time-frequency alignment for automatic speech morphing are proposed. The frequency alignment function at a specific frame is represented as a weighted average of vowel alignment functions based on similarity to each vowel. Julian, an open source speech recognition system, was used to design a time alignment function. Objective and subjective tests were conducted to evaluate the proposed method, and test results indicated that the proposed method yields comparable naturalness to the manually morphed samples in terms of time alignment. The results also illustrated that the proposed frequency alignment provides significantly better naturalness than morphed samples without frequency alignment.\",\"PeriodicalId\":186876,\"journal\":{\"name\":\"2008 IEEE Spoken Language Technology Workshop\",\"volume\":\"402 \",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE Spoken Language Technology Workshop\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT.2008.4777831\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE Spoken Language Technology Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2008.4777831","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Vowel-based frequency alignment function design and recognition-based time alignment for automatic speech morphing
New design procedures of time-frequency alignment for automatic speech morphing are proposed. The frequency alignment function at a specific frame is represented as a weighted average of vowel alignment functions based on similarity to each vowel. Julian, an open source speech recognition system, was used to design a time alignment function. Objective and subjective tests were conducted to evaluate the proposed method, and test results indicated that the proposed method yields comparable naturalness to the manually morphed samples in terms of time alignment. The results also illustrated that the proposed frequency alignment provides significantly better naturalness than morphed samples without frequency alignment.