5th International Conference on Spoken Language Processing (ICSLP 1998)最新文献

筛选
英文 中文
Formant diphone parameter extraction utilising a labelled single-speaker database 利用标记的单扬声器数据库提取峰峰diphone参数
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-36
R. Mannell
{"title":"Formant diphone parameter extraction utilising a labelled single-speaker database","authors":"R. Mannell","doi":"10.21437/ICSLP.1998-36","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-36","url":null,"abstract":"This paper examines a method for formant parameter extraction from a labeled single speaker database for use in a formant-parameter diphone-concatenation speech synthesis system. This procedure commences with an initial formant analysis of the labelled database, which is then used to obtain formant (F1-F5) probability spaces for each phoneme. These probability spaces guide a more careful speaker- specific extraction of formant frequencies. An analysis-by-synthesis procedure is then used to provide best-matching formant intensity and bandwidth parameters. The great majority of the parameters so extracted produce speech which is highly intelligible and which has a voice quality close to the original speaker. Synthesis techniques based upon LPC-parameter or waveform concatenation are much less vulnerable to the effects of poorly extracted parameters. The formant model is, however, more straightforwardly related to the source-filter model and thus to speech production. Whilst it is true that overlap-add concatenation of waveform-based diphones can easily model a voice with quite high fidelity, new voices and voice qualities require the recording of new speakers (or the same speaker utilising a different voice quality) and the extraction of a new diphone database. Such systems can be used to examine the effects of intonation and rhythm on voice quality or vocal affect but formant-based systems can much more readily examine the effect of frequency-domain modifications on voice quality. Such modifications might include formant frequency shifting, bandwidth modification, modification of relative formant intensities and spectral slope variation. It is even possible, if the synthesiser design allows it, to experiment with the insertion of additional poles and zeroes into the spectrum such as might occur when modelling the \"singer's formant\" for certain styles of singing voice. Such research requires a parallel formant synthesiser with a great deal of flexibility of control. Further, and most importantly, it requires a diphone database that is extremely accurate. Formant errors must be minor and few in number and this should be achieved without excessive hand correction. Formant tracks should display, as far as possible, pole continuity across fricatives, stops and affricates. Extracted intensities and","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115730884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Correspondence between the glottal gesture overlap pattern and vowel devoicing in Japanese 日语声门手势重叠模式与元音消音的对应关系
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-798
M. Fujimoto, E. Murano, S. Niimi, S. Kiritani
{"title":"Correspondence between the glottal gesture overlap pattern and vowel devoicing in Japanese","authors":"M. Fujimoto, E. Murano, S. Niimi, S. Kiritani","doi":"10.21437/ICSLP.1998-798","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-798","url":null,"abstract":"Correspondence between the glottal opening gesture pattern and vowel devoicing in Japanese was examined using PGG with special reference to the pattern of glottal gesture overlap and blending into the neighboring vowel. The results showed that most of the tokens demonstrated either a single glottal opening pattern with a devoiced vowel, or a double glottal opening with a voiced vowel during /CiC/ sequences as generally expected. Some tokens, however, showed a double glottal opening with a devoiced vowel, or a single glottal opening with a partially voiced vowel. From the viewpoint of gestural overlap analysis of vowel devoicing, an intermediate process of gestural overlap may explain the occurrence of the case in which the vowel was devoiced and showed a double phase opening. Nevertheless, the presence of a partially voiced vowel with a single opening phase clearly shows the complexity of vowel devoicing in Japanese, since there are possibly two different patterns of glottal opening (single phase and double phase), which could be observed in PGG analysis, in utterances with partially voiced vowels.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115758662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Performance and optimization of the SEEVOC algorithm SEEVOC算法的性能与优化
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-379
Weihua Zhang, W. Holmes
{"title":"Performance and optimization of the SEEVOC algorithm","authors":"Weihua Zhang, W. Holmes","doi":"10.21437/ICSLP.1998-379","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-379","url":null,"abstract":"In most low bit rate coders, the quality of the synthetic speech depends greatly on the performance of the spectral coding stage, in which the spectral envelope is estimated and encoded. The Spectral Envelope Estimation Vocoder (SEEVOC) is a successful spectral envelope estimation method that plays an important role in low bit rate speech coding based on the sinusoidal model. This paper investigates the properties and limitations of the SEEVOC algorithm, and shows that it can be generalized and optimized by changing the search range parameters a and b . Rules for the optimum choice of a and b are derived, based on both analysis and experimental results. The effects of noise on the SEEVOC algorithm are also investigated. Experimental results show that the SEEVOC algorithm performs better for voiced speech in the presence of noise than linear prediction (LP) analysis.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124195706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A context-dependent approach for speaker verification using sequential decision 基于上下文的顺序决策说话人验证方法
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-229
H. Noda, Katsuya Harada, E. Kawaguchi, H. Sawai
{"title":"A context-dependent approach for speaker verification using sequential decision","authors":"H. Noda, Katsuya Harada, E. Kawaguchi, H. Sawai","doi":"10.21437/ICSLP.1998-229","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-229","url":null,"abstract":"This paper is concerned about speaker veri ca tion SV using the sequential probability ratio test SPRT In the SPRT input samples are usually as sumed to be i i d samples from a probability density function because an on line probability computation is required Feature vectors used in speech processing obviously do not satisfy the assumption and there fore the correlation between successive feature vectors has not been considered in conventional SV using the SPRT The correlation can be modeled by the hidden Markov model HMM but unfortunately the HMM can not be directly applied to the SPRT because of statistical dependence of input samples This paper proposes a method of HMM probability computation using the mean eld approximation to resolve this problem where the probability of whole input samples is nominally represented as the product of probability of each sample as if input samples were independent each other","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124568383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Can we hear smile? 我们能听到微笑吗?
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-106
M. Schröder, V. Aubergé, Marie-Agnès Cathiard
{"title":"Can we hear smile?","authors":"M. Schröder, V. Aubergé, Marie-Agnès Cathiard","doi":"10.21437/ICSLP.1998-106","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-106","url":null,"abstract":"The amusement expression is both visual and audible in speech. After recording comparable spontaneous, acted, mechanical, reiterated and seduction stimuli, five perceptual experiments were held, mainly based on the hypothesis of prosodically controlled effects of amusement on speech. Results show that audio is partially independant from video, which is as performant as audio-video. Spontaneous speech (unvolontary controlled) can be identified in front of acted speech (volontary controlled). Amusement speech can be distinguished from seduction speech.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115009879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Articulability of two consecutive morae in Japanese speech production: evidence from sound exchange errors in spontaneous speech 日语语音产生中两个连续音节的发音能力:来自自发语音中声音交换错误的证据
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-809
Y. Terao, Tadao Murata
{"title":"Articulability of two consecutive morae in Japanese speech production: evidence from sound exchange errors in spontaneous speech","authors":"Y. Terao, Tadao Murata","doi":"10.21437/ICSLP.1998-809","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-809","url":null,"abstract":"In the present study. we would like to discuss how the articulability of two consecutive morae plays an important role in speech production. Our assumption is based on the analysis of Japanese sound exchange error data which are collected from the spontaneous speech ofadults and infants. Three esperiments were also carried out to confirm the reality of a unit of two consecutive morae. Phonological/Phonetic characteristics lvere shown through the results of experiments and related observations.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114388458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recognition from GSM digital speech GSM数字语音识别
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-324
A. Gallardo-Antolín, F. Díaz-de-María, F. J. Valverde-Albacete
{"title":"Recognition from GSM digital speech","authors":"A. Gallardo-Antolín, F. Díaz-de-María, F. J. Valverde-Albacete","doi":"10.21437/ICSLP.1998-324","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-324","url":null,"abstract":"This paper addresses the problem of speech recognition in the GSM environment. In this context, new sources of distortion, such as transmission errors or speech coding itself, significantly degrade the performance of speech recognizers. While conventional approaches deal with these types of distortion after decoding speech, we propose to recognize from the digital speech representation of GSM. In particular, our work focuses on the 13 kbit/s RPE-LTP GSM standard speech coder. In order to test our recognizer we have compared it to a conventional recognizer in several simulated situations, which allow us to gain insight into more practical ones. Specifically, besides recognizing from clean digital speech and evaluating the influence of speech coding distortion, the proposed recognizer is faced with speech degraded by random errors, burst errors and frame substitutions. The results are very encouraging: the worse the transmission conditions are, the more recognizing from digital speech outperforms the conventional approach.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114871001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Perceived prominence and acoustic parameters in american English 美式英语的感知突出和声学参数
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-133
T. Portele
{"title":"Perceived prominence and acoustic parameters in american English","authors":"T. Portele","doi":"10.21437/ICSLP.1998-133","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-133","url":null,"abstract":"This paper describes the relationships between perceived prominence as a gradual value and some acoustic-prosodic parameters. Prominence is used as an intermediate parameter in a speech synthesis system. A corpus of American English utterances was constructed by measuring and annotating various linguistic, acoustic and perceptual parameters and features. The investigation of the corpus revealed some strong and some rather weak relations between prominence and acoustic-prosodic parameters that serve as a starting point for the development of prominence-based rules for the synthesis of American English prosody in a content-to-speech system.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114924488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Data-driven extensions to HMM statistical dependencies 数据驱动的HMM统计依赖关系扩展
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-166
J. Bilmes
{"title":"Data-driven extensions to HMM statistical dependencies","authors":"J. Bilmes","doi":"10.21437/ICSLP.1998-166","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-166","url":null,"abstract":"In this paper, a new technique is introduced that relaxes the HMM conditional independence assumption in a principled way. Without increasing the number of states, the modeling power of an HMM is increased by including only those additional probabilistic dependencies (to the surrounding observation context) that are believed to be both relevant and discriminative. Conditional mutual information is used to determine both relevance and discriminability. Extended Gaussian-mixture HMMs and new EM update equations are introduced. In an isolated word speech database, results show an average 34% word error improvement over an HMM with the same number of states, and a 15% improvement over an HMM with a comparable number of parameters.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115052185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Acoustic-articulatory evaluation of the upper vowel-formant region and its presumed speaker-specific potency 上元音形成峰区域的声学-发音评估及其假定的说话人特异性效力
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-359
F. Clermont, P. Mokhtari
{"title":"Acoustic-articulatory evaluation of the upper vowel-formant region and its presumed speaker-specific potency","authors":"F. Clermont, P. Mokhtari","doi":"10.21437/ICSLP.1998-359","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-359","url":null,"abstract":"We present some evidence indicating that phonetic distinctiveness and speaker individuality, are indeed manifested in vowels’ vocal-tract shapes estimated from the lower and the upper formant-frequencies, respec-tively. The methodology developed to demonstrate this dichotomy, (cid:12)rst implicates Schroeder’s [8] acous-tic-articulatory model which can be coerced to yield, on a per-vowel and a per-speaker basis, area-function approximations to vocal-tract shapes of di(cid:11)ering formant components. Using ten steady-state vowels recorded in /hVd/-context, (cid:12)ve times at random, by four adult-male speakers of Australian English, the variability of result-ing shapes aligned at mid-length was then measured on an intra- and an inter-speaker basis. Gross shapes estimated from the lower formants, were indeed found to cause the largest spread amongst the vowels of individual speakers. By contrast, the more detailed shapes obtained by recruiting certain higher formants of the front and the back vowels, accounted for the largest spread amongst the speakers. Collectively, these results contribute a quasi-articulatory substantiation of a long-standing view on the speaker-speci(cid:12)c potency of the upper formant region of spoken vowels, together with some useful implications for automatic speech and speaker recognition.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115107646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信