2012 8th International Symposium on Chinese Spoken Language Processing最新文献_第9页

Detection and emphatic realization of contrastive word pairs for expressive text-to-speech synthesis 文本-语音合成中对比词对的检测和重点实现

2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423493

Chun Xing Li, Zhiyong Wu, Fanbo Meng, H. Meng, Lianhong Cai

引用次数: 4

A study of F0 modelling and generation with lyrics and shape characterization for singing voice synthesis 基于歌词和形状特征的F0建模和生成方法的研究

2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423491

Siu Wa Lee, M. Dong, Haizhou Li

引用次数: 4

A fast two-microphone noise reduction algorithm based on power level ratio for mobile phone 基于功率级比的手机双麦克风快速降噪算法

2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423512

Jian Zhang, Risheng Xia, Zhonghua Fu, Junfeng Li, Yonghong Yan

引用次数: 16

Alleviating the small sample-size problem in i-vector based speaker verification 缓解基于i向量的说话人验证中的小样本问题

2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423527

Wei Rao, M. Mak

{"title":"Alleviating the small sample-size problem in i-vector based speaker verification","authors":"Wei Rao, M. Mak","doi":"10.1109/ISCSLP.2012.6423527","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423527","url":null,"abstract":"This paper investigates the small sample-size problem in i-vector based speaker verification systems. The idea of i-vectors is to represent the characteristics of speakers in the factors of a factor analyzer. Because the factor loading matrix defines the possible speaker and channel-variability of i-vectors, it is important to suppress the unwanted channel variability. Linear discriminant analysis (LDA), within-class covariance normalization (WCCN), and probabilistic LDA are commonly used for such purpose. These methods, however, require training data comprising many speakers each providing sufficient recording sessions for good performance. Performance will suffer when the number of speakers and/or number of sessions per speaker are too small. This paper compares four approaches to addressing this small sample-size problem: (1) preprocessing the i-vectors by PCA before applying LDA (PCA+LDA), (2) replacing the matrix inverse in LDA by pseudo-inverse, (3) applying multi-way LDA by exploiting the microphone and speaker labels of the training data, and (4) increasing the matrix rank in LDA by generating more i-vectors using utterance partitioning. Results based on NIST 2010 SRE suggests that utterance partitioning performs the best, followed by multi-way LDA and PCA+LDA.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127683899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

A comparative study of perception of tone 2 and tone 3 in Mandarin by native speakers and Japanese learners 本族语和日语学习者对普通话声调二和声调三感知的比较研究

2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423540

T. Zou, Jinsong Zhang, Wen Cao

{"title":"A comparative study of perception of tone 2 and tone 3 in Mandarin by native speakers and Japanese learners","authors":"T. Zou, Jinsong Zhang, Wen Cao","doi":"10.1109/ISCSLP.2012.6423540","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423540","url":null,"abstract":"This paper investigated Mandarin Tone 2-Tone 3 perceptual space in isolated syllables and disyllables of native speakers and Japanese learners. In two experiments, we examined the listeners' use of pitch height and the position of turning point as cues of tone identity. The result showed that, in isolated syllables, Chinese perceived these two tones in categorical fashion. Pitch height was more important than the turning point as a cue. Within a certain range of pitch height, there was a complementary relationship between these two variables. The perceptual result of Japanese subjects did not show apparent categorical pattern. In disyllables, for Chinese subjects, the contextual influence on the boundary position in Tone 2-half Tone 3 continuum was not significant, but the boundary position in pitch height and turning point Tone 2-Tone 3 continuum shifted significantly in different tonal context. Comparing to Chinese subjects, Japanese subjects' perceptual ranges of Tone 3 in isolated syllables and disyllables were narrower, and it's more difficult for them to identify these two tones in disyllables.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133378858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Experiments on unsupervised statistical parametric speech synthesis 无监督统计参数语音合成实验

2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423518

Jinfu Ni, Y. Shiga, H. Kawai, H. Kashioka

引用次数: 0

Analyzing semantic orientation of terms using Affinity Propagation 使用关联传播分析术语的语义方向

2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423494

Yan Li, Si Li, Weiran Xu, Jun Guo

引用次数: 0

Preliminary study on the interlanguage speech intelligibility benefit for English-Mandarin bilingual l2 learners 中际语言语可理解性对英汉双语第二语言学习者的益处初步研究

2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423487

Guo Li, P. Mok

{"title":"Preliminary study on the interlanguage speech intelligibility benefit for English-Mandarin bilingual l2 learners","authors":"Guo Li, P. Mok","doi":"10.1109/ISCSLP.2012.6423487","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423487","url":null,"abstract":"Previous studies into interlanguage speech intelligibility benefit (ISIB) have focused on the influence of subjects' native language (L1) on the phonetic production and perception in their second language (L2). However, no research so far has examined the effect of the listeners' exposure and training in a second language (L2) on their understanding of L2-accented native language (L1). This paper aims to address this issue with subjects whose L1 is English, L2 is Mandarin. Characteristics of Mandarin-accented English include the devoicing of word-final consonants, and the insufficient distinction of the vowel pairs /i:/ - /i/ and /ε/ - /æ/. These features could negatively affect listeners' understanding of contrastive word pairs. In this study, 9 native Mandarin listeners, 9 monolingual English listeners and 9 English-Mandarin bilinguals were asked to listen to recordings of Mandarin-accented English and identify minimal pairs involving the above consonant and vowel contrasts. Results show that among all three groups of subjects, native Mandarin listeners scored the highest accuracy, but English listeners with training in Mandarin and monolingual English speakers had similar scores. These findings support the existence of ISIB for Mandarin, and call for further study on bilingual L2 learners.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127199293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hierarchical prosodic pattern selection based on Fujisaki model for natural mandarin speech synthesis 基于Fujisaki模型的汉语自然语音合成韵律模式分层选择

2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423536

Yi-Chin Huang, Chung-Hsien Wu, Sz-Ting Weng

{"title":"Hierarchical prosodic pattern selection based on Fujisaki model for natural mandarin speech synthesis","authors":"Yi-Chin Huang, Chung-Hsien Wu, Sz-Ting Weng","doi":"10.1109/ISCSLP.2012.6423536","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423536","url":null,"abstract":"In this paper, a novel hierarchical prosodic unit selection method is proposed based on pitch contour pattern retrieval, in order to obtained natural pitch contour of the personalized synthetic voice. In this framework, a hierarchical prosodic unit based on Fujisaki model is used to take local pitch contour variation and global intonation of utterance into account. Furthermore, novel ways of integrating pitch contour pattern of prosodic units in the prosodic model are invents in order to improve the selection mechanism of the appropriate pitch contour. A novel prosodic unit selection method is proposed based on sentence retrieval, which not only uses the traditional linguistic cue as selection criterion, but also the shape of the pitch contour. Also, the codewords of pitch patterns in the training corpus and synthesized corpus were constructed by the proposed method and were used to map the relation between training codeword and synthesized corpus. Finally, the language model of pitch pattern is adopted to find the proper pitch pattern sequence of input text. The evaluation results demonstrate that the proposed prosodic model substantially improves naturalness of the intonation of the synthesized speech compared to that of model-based method.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"17 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125733164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Acoustic analysis of disguised voices with raised and lowered pitch 音调升高和降低的伪装声音的声学分析

2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423510

Cuiling Zhang

引用次数: 15