{"title":"The modeling of tongue tip in Standard Chinese using MRI","authors":"Gaowu Wang, J. Dang, Jiangping Kong","doi":"10.1109/ISCSLP.2014.6936625","DOIUrl":"https://doi.org/10.1109/ISCSLP.2014.6936625","url":null,"abstract":"In this paper, the tongue tip was modeled based on the articulatory data from MRI images in Standard Chinese. First, the MRI articulatory database of Standard Chinese, including 9 vowels and 75 consonant variants, were established. Second, Principle Component Analysis (PCA) was performed on the tongue shape to find articulatory factors, and the result showed that it would be more precise and concise when the tongue was divided as the tongue tip and tongue body and modeled separately. Finally, according to this result, the tongue tip was modeled by two articulatory parameters: Tongue Tip Protrude and Tongue Tip Raise, which represents the protruding/advancing and raising/retroflexing movements of the tongue tip.","PeriodicalId":271277,"journal":{"name":"International Symposium on Chinese Spoken Language Processing","volume":"85 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120863883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Speaker adaptation of speaking rate-dependent hierarchical prosodic model for Mandarin TTS","authors":"Po-Chun Wang, I-Bin Liao, Chen-Yu Chiang, Yih-Ru Wang, Sin-Horng Chen","doi":"10.1109/ISCSLP.2014.6936616","DOIUrl":"https://doi.org/10.1109/ISCSLP.2014.6936616","url":null,"abstract":"In this paper, a speaker adaptation method to adapt an existing speaking rate-dependent hierarchical prosodic model (SR-HPM) of an SR-controlled Mandarin TTS system to new speaker's data for realizing a new voice is proposed. Two main problems are addressed: data sparseness for few adaptation utterances existing only in a small range of normal speaking rate and no adaptation data in both ranges of fast and slow speaking rates. The proposed method follows the idea of SR-HPM training to firstly normalize the prosodic-acoustic features of the new speaker's speech data, to then train an HPM by the prosody labeling and modeling algorithm, and to lastly refine the HPM to an SR-dependent model. The MAP adaptation method with model parameter extrapolation is applied to cope with the above two problems. Experimental results on a male speaker's adaptation data confirmed that the resulting adaptive SR-HPM has reasonable parameters covering a wide range of speaking rates and hence can be used in the TTS system to generate prosodic-acoustic features for synthesizing the new speaker's voice of any given SR.","PeriodicalId":271277,"journal":{"name":"International Symposium on Chinese Spoken Language Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121115668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How to describe speech emotion more completely - An investigation on Chinese broadcast news speech","authors":"Yingying Gao, Weibin Zhu","doi":"10.1109/ISCSLP.2012.6423508","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423508","url":null,"abstract":"A multi-perspective describing method for emotional speech is proposed. Three perspectives - cognitive, psychological, physiological ones and seven phonetic evaluation features are involved, with which it is expected to describe speech emotion more completely and to bridge the gap between emotions and acoustic signals. A series of experiments including cognition tests and corpus annotation are implemented. Although preliminarily, the results still reveal the effectiveness of the method.","PeriodicalId":271277,"journal":{"name":"International Symposium on Chinese Spoken Language Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114465894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Acoustic modeling for native and non-native Mandarin speech recognition","authors":"Xin Chen, Jian Cheng","doi":"10.1109/ISCSLP.2012.6423544","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423544","url":null,"abstract":"In this paper, we first described the automatic Spoken Chinese Test (SCT). With a large amount of native and non-native data collected for SCT, different training strategies for acoustic modeling were investigated. Evaluations were performed on native as well as non-native datasets. We discovered that directly combining native and non-native data to train acoustic models did not work well, and the acoustic model trained only on native data achieved better performance when applying to non-native speech. We investigated how to use non-native data effectively, and found that Phonetic Decision Tree (PDT) had a great impact. Discriminative training was found to improve speech recognition accuracy effectively for both native and non-native Mandarin speech.","PeriodicalId":271277,"journal":{"name":"International Symposium on Chinese Spoken Language Processing","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121374318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Development of an articulatory visual-speech synthesizer to support language learning","authors":"Ka Ho WONG, Wai-Kim Leung, W. Lo, H. Meng","doi":"10.1109/ISCSLP.2010.5684832","DOIUrl":"https://doi.org/10.1109/ISCSLP.2010.5684832","url":null,"abstract":"This paper presents a two-dimensional (2D) visual-speech synthesizer to support language learning. A visual-speech synthesizer animates the human articulators in synchronization with speech signals, e.g., output from a text-to-speech synthesizer. A visual-speech animation can offer a concrete illustration to the language learners on how to move and where to place the articulators when pronouncing a phoneme. We adopt a 2D vector-based viseme models and compiled a collection of visemes to cover the articulation of all English phonemes (42 visemes for the 44 English phonemes). Morphing between properly selected vector-based articulation images achieves articulatory animations. In this way, we have developed an articulatory visual speech synthesizer that can accept free-text input and synthesize articulatory dynamics in real-time. Evaluation involving 32 subjects based on “lip-reading” shows that they can identify the appropriate word(s) based on articulation animation alone nearly ∼80% of the time","PeriodicalId":271277,"journal":{"name":"International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130784605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Automatic Tone Correction in Non-native Mandarin","authors":"Mitchell Peabody, S. Seneff","doi":"10.1007/11939993_62","DOIUrl":"https://doi.org/10.1007/11939993_62","url":null,"abstract":"","PeriodicalId":271277,"journal":{"name":"International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125848772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nonlinear Emotional Prosody Generation and Annotation","authors":"J. Tao, Jian Yu, Yongguo Kang","doi":"10.1007/11939993_23","DOIUrl":"https://doi.org/10.1007/11939993_23","url":null,"abstract":"","PeriodicalId":271277,"journal":{"name":"International Symposium on Chinese Spoken Language Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115071017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pitch Mean Based Frequency Warping","authors":"Jian Liu, T. Zheng, Wenhu Wu","doi":"10.1007/11939993_13","DOIUrl":"https://doi.org/10.1007/11939993_13","url":null,"abstract":"","PeriodicalId":271277,"journal":{"name":"International Symposium on Chinese Spoken Language Processing","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129314000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Meeting Segmentation Using Two-Layer Cascaded Subband Filters","authors":"M. Giuliani, T. Nwe, Haizhou Li","doi":"10.1007/11939993_68","DOIUrl":"https://doi.org/10.1007/11939993_68","url":null,"abstract":"","PeriodicalId":271277,"journal":{"name":"International Symposium on Chinese Spoken Language Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121173966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Language Identification by Using Syllable-Based Duration Classification on Code-Switching Speech","authors":"Dau-Cheng Lyu, Ren-Yuan Lyu, Yuang-Chin Chiang, Chun-Nan Hsu","doi":"10.1007/11939993_50","DOIUrl":"https://doi.org/10.1007/11939993_50","url":null,"abstract":"","PeriodicalId":271277,"journal":{"name":"International Symposium on Chinese Spoken Language Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115331991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}