H. Kawai, N. Higuchi, Tohru Shimizu, Seiichi Yamamoto
{"title":"基于波形拼接的日语文本转语音系统的开发","authors":"H. Kawai, N. Higuchi, Tohru Shimizu, Seiichi Yamamoto","doi":"10.1109/ICASSP.1994.389230","DOIUrl":null,"url":null,"abstract":"A text-to-speech system for Japanese was developed based on waveform splicing. A stored unit is a sequence of phonemes segmented at vowel-consonant boundaries. Four and eight phoneme groups are distinguished for the preceding and succeeding phonemic environment, respectively. An inventory of waveform segments including frequently used 1020 units was constructed based on a statistical analysis of a text database consisting of 20 million phonemes. Each stored unit has, on average, 2.5 waveform segments with different fundamental frequency (F/sub 0/) and phoneme duration. The F/sub 0/ and phoneme duration are modified by a pitch synchronous overlap add (PSOLA) method. A time window which has a flat portion at its center (Tukey window) was adopted in place of an ordinary Hanning window. A preference test indicated that the Tukey window gives better quality when the F/sub 0/ is lowered. The articulation score of an intelligibility test was 89.2%.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Development of a text-to-speech system for Japanese based on waveform splicing\",\"authors\":\"H. Kawai, N. Higuchi, Tohru Shimizu, Seiichi Yamamoto\",\"doi\":\"10.1109/ICASSP.1994.389230\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A text-to-speech system for Japanese was developed based on waveform splicing. A stored unit is a sequence of phonemes segmented at vowel-consonant boundaries. Four and eight phoneme groups are distinguished for the preceding and succeeding phonemic environment, respectively. An inventory of waveform segments including frequently used 1020 units was constructed based on a statistical analysis of a text database consisting of 20 million phonemes. Each stored unit has, on average, 2.5 waveform segments with different fundamental frequency (F/sub 0/) and phoneme duration. The F/sub 0/ and phoneme duration are modified by a pitch synchronous overlap add (PSOLA) method. A time window which has a flat portion at its center (Tukey window) was adopted in place of an ordinary Hanning window. A preference test indicated that the Tukey window gives better quality when the F/sub 0/ is lowered. The articulation score of an intelligibility test was 89.2%.<<ETX>>\",\"PeriodicalId\":290798,\"journal\":{\"name\":\"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1994-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.1994.389230\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.1994.389230","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Development of a text-to-speech system for Japanese based on waveform splicing
A text-to-speech system for Japanese was developed based on waveform splicing. A stored unit is a sequence of phonemes segmented at vowel-consonant boundaries. Four and eight phoneme groups are distinguished for the preceding and succeeding phonemic environment, respectively. An inventory of waveform segments including frequently used 1020 units was constructed based on a statistical analysis of a text database consisting of 20 million phonemes. Each stored unit has, on average, 2.5 waveform segments with different fundamental frequency (F/sub 0/) and phoneme duration. The F/sub 0/ and phoneme duration are modified by a pitch synchronous overlap add (PSOLA) method. A time window which has a flat portion at its center (Tukey window) was adopted in place of an ordinary Hanning window. A preference test indicated that the Tukey window gives better quality when the F/sub 0/ is lowered. The articulation score of an intelligibility test was 89.2%.<>