{"title":"A linguistic and prosodic database for data-driven Japanese TTS synthesis","authors":"A. Sakurai, Takashi Natsume, K. Hirose","doi":"10.21437/ICSLP.1998-57","DOIUrl":null,"url":null,"abstract":"We propose a method to generate a database that contains a parametric representation of F0 contours associated with linguistic and acoustic information, to be used by data-driven Japanese text-to-speech (TTS) systems. The configuration of the database includes recorded speech, F0 contours and their parametric labels, phonetic transcription with durations, and other linguistic information such as orthographic transcription, part-of-speech (POS) tags, and accent types. All information that is not available by dictionary lookup is obtained automatically. In this paper, we propose a method to automatically obtain parametric labels that describe F0 contours based on a superpositional model. Preliminary tests on a small data set show that the method can find the parametric representation of F0 contours with acceptable accuracy, and that accuracy can be improved by introducing additional linguistic information.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"5th International Conference on Spoken Language Processing (ICSLP 1998)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/ICSLP.1998-57","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
We propose a method to generate a database that contains a parametric representation of F0 contours associated with linguistic and acoustic information, to be used by data-driven Japanese text-to-speech (TTS) systems. The configuration of the database includes recorded speech, F0 contours and their parametric labels, phonetic transcription with durations, and other linguistic information such as orthographic transcription, part-of-speech (POS) tags, and accent types. All information that is not available by dictionary lookup is obtained automatically. In this paper, we propose a method to automatically obtain parametric labels that describe F0 contours based on a superpositional model. Preliminary tests on a small data set show that the method can find the parametric representation of F0 contours with acceptable accuracy, and that accuracy can be improved by introducing additional linguistic information.