2012 8th International Symposium on Chinese Spoken Language Processing最新文献

筛选
英文 中文
Detection and emphatic realization of contrastive word pairs for expressive text-to-speech synthesis 文本-语音合成中对比词对的检测和重点实现
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423493
Chun Xing Li, Zhiyong Wu, Fanbo Meng, H. Meng, Lianhong Cai
{"title":"Detection and emphatic realization of contrastive word pairs for expressive text-to-speech synthesis","authors":"Chun Xing Li, Zhiyong Wu, Fanbo Meng, H. Meng, Lianhong Cai","doi":"10.1109/ISCSLP.2012.6423493","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423493","url":null,"abstract":"This paper addresses the problem of automatic detection of contrastive word pairs and their acoustic realization in emphasis for expressive text-to-speech (TTS) synthesis in English. Support vector machines (SVMs) have been used to automatically detect contrastive word pairs from lexical features, syntactic dependencies and semantic relations. A much better performance is achieved by adding accent ratio and word identity features. Hidden Markov model (HMM) based speech synthesis is then used to generate emphatic speeches by putting emphasis on the detected contrastive word pairs. Subjective experiments show that most of the listeners consider putting emphasis on contrastive word pairs is more acceptable than on non-contrastive word pairs. This indicates the importance of the accurate detection of contrastive word pairs.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128093464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A study of F0 modelling and generation with lyrics and shape characterization for singing voice synthesis 基于歌词和形状特征的F0建模和生成方法的研究
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423491
Siu Wa Lee, M. Dong, Haizhou Li
{"title":"A study of F0 modelling and generation with lyrics and shape characterization for singing voice synthesis","authors":"Siu Wa Lee, M. Dong, Haizhou Li","doi":"10.1109/ISCSLP.2012.6423491","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423491","url":null,"abstract":"Natural pitch fluctuation is essential to singing voice. Recently, we have proposed a generalized F0 modelling method which models the expected F0 fluctuation under various contexts with note HMMs. Knowing that having F0 contours close to human professional singing promotes perceived quality, we are confronted with two requirements: (1) accurate estimation on F0 and (2) precise voiced/unvoiced decisions. In this paper, we introduce two techniques in the above directions. Influence of lyrics phonetics on singing F0 is considered to capture the F0 and voicing behaviour brought from different note-lyrics combinations. The generalized F0 modelling method is further extended to frequency-domain to study if shape characterization in terms of sinusoids helps F0 estimation or not. Our experiments showed that the use of lyrics information leads to better F0 generation and improves naturalness of synthesized singing. While the frequency-domain representation is viable, its performance is less competitive than time-domain representation, which requires further study.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115873086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A fast two-microphone noise reduction algorithm based on power level ratio for mobile phone 基于功率级比的手机双麦克风快速降噪算法
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423512
Jian Zhang, Risheng Xia, Zhonghua Fu, Junfeng Li, Yonghong Yan
{"title":"A fast two-microphone noise reduction algorithm based on power level ratio for mobile phone","authors":"Jian Zhang, Risheng Xia, Zhonghua Fu, Junfeng Li, Yonghong Yan","doi":"10.1109/ISCSLP.2012.6423512","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423512","url":null,"abstract":"As an indispensable instrument in today's daily life, mobile phone that is used in diverse environments suffers from the speech quality degradation due to the presence of background noises. In this paper, we propose a novel two-microphone noise reduction system based on the power level ratio (PLR) of the observed signals. In the system, a primary microphone is placed close to the talker's mouth and an auxiliary microphone is placed away. The proposed noise reduction algorithm first calculates the ratio of the power of observed signals at the two microphones, and subsequently calculates the spectral gain function based on the power level ratio using the sigmoid function. Experimental results demonstrate that this proposed algorithm yields the much higher speech quality than the state-of-the-art noise-reduction algorithms, and more importantly involves much less computational cost which makes it feasible for mobile phone.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"39 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127439995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Alleviating the small sample-size problem in i-vector based speaker verification 缓解基于i向量的说话人验证中的小样本问题
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423527
Wei Rao, M. Mak
{"title":"Alleviating the small sample-size problem in i-vector based speaker verification","authors":"Wei Rao, M. Mak","doi":"10.1109/ISCSLP.2012.6423527","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423527","url":null,"abstract":"This paper investigates the small sample-size problem in i-vector based speaker verification systems. The idea of i-vectors is to represent the characteristics of speakers in the factors of a factor analyzer. Because the factor loading matrix defines the possible speaker and channel-variability of i-vectors, it is important to suppress the unwanted channel variability. Linear discriminant analysis (LDA), within-class covariance normalization (WCCN), and probabilistic LDA are commonly used for such purpose. These methods, however, require training data comprising many speakers each providing sufficient recording sessions for good performance. Performance will suffer when the number of speakers and/or number of sessions per speaker are too small. This paper compares four approaches to addressing this small sample-size problem: (1) preprocessing the i-vectors by PCA before applying LDA (PCA+LDA), (2) replacing the matrix inverse in LDA by pseudo-inverse, (3) applying multi-way LDA by exploiting the microphone and speaker labels of the training data, and (4) increasing the matrix rank in LDA by generating more i-vectors using utterance partitioning. Results based on NIST 2010 SRE suggests that utterance partitioning performs the best, followed by multi-way LDA and PCA+LDA.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127683899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A comparative study of perception of tone 2 and tone 3 in Mandarin by native speakers and Japanese learners 本族语和日语学习者对普通话声调二和声调三感知的比较研究
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423540
T. Zou, Jinsong Zhang, Wen Cao
{"title":"A comparative study of perception of tone 2 and tone 3 in Mandarin by native speakers and Japanese learners","authors":"T. Zou, Jinsong Zhang, Wen Cao","doi":"10.1109/ISCSLP.2012.6423540","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423540","url":null,"abstract":"This paper investigated Mandarin Tone 2-Tone 3 perceptual space in isolated syllables and disyllables of native speakers and Japanese learners. In two experiments, we examined the listeners' use of pitch height and the position of turning point as cues of tone identity. The result showed that, in isolated syllables, Chinese perceived these two tones in categorical fashion. Pitch height was more important than the turning point as a cue. Within a certain range of pitch height, there was a complementary relationship between these two variables. The perceptual result of Japanese subjects did not show apparent categorical pattern. In disyllables, for Chinese subjects, the contextual influence on the boundary position in Tone 2-half Tone 3 continuum was not significant, but the boundary position in pitch height and turning point Tone 2-Tone 3 continuum shifted significantly in different tonal context. Comparing to Chinese subjects, Japanese subjects' perceptual ranges of Tone 3 in isolated syllables and disyllables were narrower, and it's more difficult for them to identify these two tones in disyllables.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133378858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Experiments on unsupervised statistical parametric speech synthesis 无监督统计参数语音合成实验
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423518
Jinfu Ni, Y. Shiga, H. Kawai, H. Kashioka
{"title":"Experiments on unsupervised statistical parametric speech synthesis","authors":"Jinfu Ni, Y. Shiga, H. Kawai, H. Kashioka","doi":"10.1109/ISCSLP.2012.6423518","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423518","url":null,"abstract":"In order to build web-based voicefonts, an unsupervised method is needed to automate the extraction of acoustic and linguistic properties of speech. This paper addresses the impact of automatic speech transcription on statistical parametric speech synthesis based on a single speaker's 100 hour speech corpus, focusing particularly on two factors of affecting speech quality: transcript accuracy and size of training dataset. Experimental results indicate that for an unsupervised method to achieve fair (MOS 3) voice quality, 1.5 hours of speech are necessary for phone accuracy over 80% and 3.5 hours necessary for phone accuracy down to 65%. Improvement in MOS quality turns out not to be significant when more than 4 hours of speech are used. The usage of automatic transcripts certainly leads to voice degradation. One of the mechanisms behind this is that transcript errors cause mismatches between speech segments and phone labels that significantly distort the structures of decision trees in resultant HMM-based voices.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133860354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing semantic orientation of terms using Affinity Propagation 使用关联传播分析术语的语义方向
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423494
Yan Li, Si Li, Weiran Xu, Jun Guo
{"title":"Analyzing semantic orientation of terms using Affinity Propagation","authors":"Yan Li, Si Li, Weiran Xu, Jun Guo","doi":"10.1109/ISCSLP.2012.6423494","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423494","url":null,"abstract":"The aim of term semantic orientation analysis is to mine the sentiment polarity of words and phrases from their contexts. This paper presents a novel algorithm called Affinity Propagation to analyze semantic orientations of terms. Specifically, we build an informative graph from text corpus using an efficient Word Activation Force model and regard each term as a node in the graph. Then we propagate opinionated information over the whole graph using only a small number of seed terms. We finally utilize affinity vectors rather than context vectors to detect term polarities and construct the polarity lexicons. Evaluations on our proposed algorithm show its advantages over the state-of-the-art algorithms. And further improvements can be obtained by combining Affinity Propagation with Pointwise Mutual Information.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133884079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Preliminary study on the interlanguage speech intelligibility benefit for English-Mandarin bilingual l2 learners 中际语言语可理解性对英汉双语第二语言学习者的益处初步研究
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423487
Guo Li, P. Mok
{"title":"Preliminary study on the interlanguage speech intelligibility benefit for English-Mandarin bilingual l2 learners","authors":"Guo Li, P. Mok","doi":"10.1109/ISCSLP.2012.6423487","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423487","url":null,"abstract":"Previous studies into interlanguage speech intelligibility benefit (ISIB) have focused on the influence of subjects' native language (L1) on the phonetic production and perception in their second language (L2). However, no research so far has examined the effect of the listeners' exposure and training in a second language (L2) on their understanding of L2-accented native language (L1). This paper aims to address this issue with subjects whose L1 is English, L2 is Mandarin. Characteristics of Mandarin-accented English include the devoicing of word-final consonants, and the insufficient distinction of the vowel pairs /i:/ - /i/ and /ε/ - /æ/. These features could negatively affect listeners' understanding of contrastive word pairs. In this study, 9 native Mandarin listeners, 9 monolingual English listeners and 9 English-Mandarin bilinguals were asked to listen to recordings of Mandarin-accented English and identify minimal pairs involving the above consonant and vowel contrasts. Results show that among all three groups of subjects, native Mandarin listeners scored the highest accuracy, but English listeners with training in Mandarin and monolingual English speakers had similar scores. These findings support the existence of ISIB for Mandarin, and call for further study on bilingual L2 learners.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127199293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical prosodic pattern selection based on Fujisaki model for natural mandarin speech synthesis 基于Fujisaki模型的汉语自然语音合成韵律模式分层选择
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423536
Yi-Chin Huang, Chung-Hsien Wu, Sz-Ting Weng
{"title":"Hierarchical prosodic pattern selection based on Fujisaki model for natural mandarin speech synthesis","authors":"Yi-Chin Huang, Chung-Hsien Wu, Sz-Ting Weng","doi":"10.1109/ISCSLP.2012.6423536","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423536","url":null,"abstract":"In this paper, a novel hierarchical prosodic unit selection method is proposed based on pitch contour pattern retrieval, in order to obtained natural pitch contour of the personalized synthetic voice. In this framework, a hierarchical prosodic unit based on Fujisaki model is used to take local pitch contour variation and global intonation of utterance into account. Furthermore, novel ways of integrating pitch contour pattern of prosodic units in the prosodic model are invents in order to improve the selection mechanism of the appropriate pitch contour. A novel prosodic unit selection method is proposed based on sentence retrieval, which not only uses the traditional linguistic cue as selection criterion, but also the shape of the pitch contour. Also, the codewords of pitch patterns in the training corpus and synthesized corpus were constructed by the proposed method and were used to map the relation between training codeword and synthesized corpus. Finally, the language model of pitch pattern is adopted to find the proper pitch pattern sequence of input text. The evaluation results demonstrate that the proposed prosodic model substantially improves naturalness of the intonation of the synthesized speech compared to that of model-based method.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"17 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125733164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Acoustic analysis of disguised voices with raised and lowered pitch 音调升高和降低的伪装声音的声学分析
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423510
Cuiling Zhang
{"title":"Acoustic analysis of disguised voices with raised and lowered pitch","authors":"Cuiling Zhang","doi":"10.1109/ISCSLP.2012.6423510","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423510","url":null,"abstract":"Change of pitch is a common disguise type adopted by criminals in forensic voice comparison, which introduces substantial variance of acoustic properties and results in poorer performance of speaker recognition. This paper investigates the acoustic properties of disguised voices with raised and lowered pitch from 11 Chinese male speakers. Parameters including fundamental frequency, syllable duration, intensity, vowel formant frequencies, and long term average spectrum (LTAS) were measured and statistically compared with those of normal voice. The effect of voice disguise on speaker recognition by both human and machine is also evaluated. The results show that speakers have different ability of adjusting pitch. Pitch change results in corresponding change of other parameters and degradation of speaker recognition by parameter discrimination, auditory perception and automatic speaker recognition, but some systematic changes of parameters provide clues for forensic voice comparison.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125792362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信