5th International Conference on Spoken Language Processing (ICSLP 1998)最新文献_第9页

Formant diphone parameter extraction utilising a labelled single-speaker database 利用标记的单扬声器数据库提取峰峰diphone参数

5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-36

R. Mannell

{"title":"Formant diphone parameter extraction utilising a labelled single-speaker database","authors":"R. Mannell","doi":"10.21437/ICSLP.1998-36","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-36","url":null,"abstract":"This paper examines a method for formant parameter extraction from a labeled single speaker database for use in a formant-parameter diphone-concatenation speech synthesis system. This procedure commences with an initial formant analysis of the labelled database, which is then used to obtain formant (F1-F5) probability spaces for each phoneme. These probability spaces guide a more careful speaker- specific extraction of formant frequencies. An analysis-by-synthesis procedure is then used to provide best-matching formant intensity and bandwidth parameters. The great majority of the parameters so extracted produce speech which is highly intelligible and which has a voice quality close to the original speaker. Synthesis techniques based upon LPC-parameter or waveform concatenation are much less vulnerable to the effects of poorly extracted parameters. The formant model is, however, more straightforwardly related to the source-filter model and thus to speech production. Whilst it is true that overlap-add concatenation of waveform-based diphones can easily model a voice with quite high fidelity, new voices and voice qualities require the recording of new speakers (or the same speaker utilising a different voice quality) and the extraction of a new diphone database. Such systems can be used to examine the effects of intonation and rhythm on voice quality or vocal affect but formant-based systems can much more readily examine the effect of frequency-domain modifications on voice quality. Such modifications might include formant frequency shifting, bandwidth modification, modification of relative formant intensities and spectral slope variation. It is even possible, if the synthesiser design allows it, to experiment with the insertion of additional poles and zeroes into the spectrum such as might occur when modelling the \"singer's formant\" for certain styles of singing voice. Such research requires a parallel formant synthesiser with a great deal of flexibility of control. Further, and most importantly, it requires a diphone database that is extremely accurate. Formant errors must be minor and few in number and this should be achieved without excessive hand correction. Formant tracks should display, as far as possible, pole continuity across fricatives, stops and affricates. Extracted intensities and","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115730884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Correspondence between the glottal gesture overlap pattern and vowel devoicing in Japanese 日语声门手势重叠模式与元音消音的对应关系

5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-798

M. Fujimoto, E. Murano, S. Niimi, S. Kiritani

{"title":"Correspondence between the glottal gesture overlap pattern and vowel devoicing in Japanese","authors":"M. Fujimoto, E. Murano, S. Niimi, S. Kiritani","doi":"10.21437/ICSLP.1998-798","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-798","url":null,"abstract":"Correspondence between the glottal opening gesture pattern and vowel devoicing in Japanese was examined using PGG with special reference to the pattern of glottal gesture overlap and blending into the neighboring vowel. The results showed that most of the tokens demonstrated either a single glottal opening pattern with a devoiced vowel, or a double glottal opening with a voiced vowel during /CiC/ sequences as generally expected. Some tokens, however, showed a double glottal opening with a devoiced vowel, or a single glottal opening with a partially voiced vowel. From the viewpoint of gestural overlap analysis of vowel devoicing, an intermediate process of gestural overlap may explain the occurrence of the case in which the vowel was devoiced and showed a double phase opening. Nevertheless, the presence of a partially voiced vowel with a single opening phase clearly shows the complexity of vowel devoicing in Japanese, since there are possibly two different patterns of glottal opening (single phase and double phase), which could be observed in PGG analysis, in utterances with partially voiced vowels.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115758662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Performance and optimization of the SEEVOC algorithm SEEVOC算法的性能与优化

5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-379

Weihua Zhang, W. Holmes

引用次数: 2

A context-dependent approach for speaker verification using sequential decision 基于上下文的顺序决策说话人验证方法

5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-229

H. Noda, Katsuya Harada, E. Kawaguchi, H. Sawai

引用次数: 1

Can we hear smile? 我们能听到微笑吗?

5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-106

M. Schröder, V. Aubergé, Marie-Agnès Cathiard

引用次数: 24

Articulability of two consecutive morae in Japanese speech production: evidence from sound exchange errors in spontaneous speech 日语语音产生中两个连续音节的发音能力:来自自发语音中声音交换错误的证据

5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-809

Y. Terao, Tadao Murata

引用次数: 0

Recognition from GSM digital speech GSM数字语音识别

5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-324

A. Gallardo-Antolín, F. Díaz-de-María, F. J. Valverde-Albacete

引用次数: 21

Perceived prominence and acoustic parameters in american English 美式英语的感知突出和声学参数

5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-133

T. Portele

引用次数: 14

Data-driven extensions to HMM statistical dependencies 数据驱动的HMM统计依赖关系扩展

5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-166

J. Bilmes

引用次数: 23

Acoustic-articulatory evaluation of the upper vowel-formant region and its presumed speaker-specific potency 上元音形成峰区域的声学-发音评估及其假定的说话人特异性效力

5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-359

F. Clermont, P. Mokhtari

{"title":"Acoustic-articulatory evaluation of the upper vowel-formant region and its presumed speaker-specific potency","authors":"F. Clermont, P. Mokhtari","doi":"10.21437/ICSLP.1998-359","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-359","url":null,"abstract":"We present some evidence indicating that phonetic distinctiveness and speaker individuality, are indeed manifested in vowels’ vocal-tract shapes estimated from the lower and the upper formant-frequencies, respec-tively. The methodology developed to demonstrate this dichotomy, (cid:12)rst implicates Schroeder’s [8] acous-tic-articulatory model which can be coerced to yield, on a per-vowel and a per-speaker basis, area-function approximations to vocal-tract shapes of di(cid:11)ering formant components. Using ten steady-state vowels recorded in /hVd/-context, (cid:12)ve times at random, by four adult-male speakers of Australian English, the variability of result-ing shapes aligned at mid-length was then measured on an intra- and an inter-speaker basis. Gross shapes estimated from the lower formants, were indeed found to cause the largest spread amongst the vowels of individual speakers. By contrast, the more detailed shapes obtained by recruiting certain higher formants of the front and the back vowels, accounted for the largest spread amongst the speakers. Collectively, these results contribute a quasi-articulatory substantiation of a long-standing view on the speaker-speci(cid:12)c potency of the upper formant region of spoken vowels, together with some useful implications for automatic speech and speaker recognition.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115107646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7