5th International Conference on Spoken Language Processing (ICSLP 1998)最新文献

筛选
英文 中文
A novel method of formant analysis and glottal inverse filtering 一种新的形成峰分析和声门反滤波方法
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-543
Steve Pearson
{"title":"A novel method of formant analysis and glottal inverse filtering","authors":"Steve Pearson","doi":"10.21437/ICSLP.1998-543","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-543","url":null,"abstract":"This paper presents a class of methods for automatically extracting formant parameters from speech. The methods rely on an iterative optimization algorithm. It was found that formant parameter data derived with these methods was less prone to discontinuity errors than conventional methods. Also, experiments were conducted that demonstrated that these methods are capable of better accuracy in formant estimation than LPC, especially for the first formant. In some cases, the analytic (non-iterative) solution has been derived, making real time applications feasible. The main target that we have been pursuing is text-to-speech (TTS) conversion. These methods are being used to automatically analyze a concatenation database, without the need for a tuning phase to fix errors. In addition, they are instrumental in realizing high quality pitch tracking, and pitch epoch marking.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115309172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
High-speed speaker adaptation using phoneme dependent tree-structured speaker clustering 基于音素相关树形说话人聚类的高速说话人自适应
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-745
Motoyuki Suzuki, T. Abe, H. Mori, S. Makino, H. Aso
{"title":"High-speed speaker adaptation using phoneme dependent tree-structured speaker clustering","authors":"Motoyuki Suzuki, T. Abe, H. Mori, S. Makino, H. Aso","doi":"10.21437/ICSLP.1998-745","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-745","url":null,"abstract":"The tree-structured speaker clustering was proposed as a highspeed speaker adaptation method. It can select the model which is most similar to a target speaker. However, this method does not consider speaker difference dependent on phoneme class. In this paper, we propose a speaker adaptation method based on speaker clustering by taking speaker difference dependent on phoneme class into account. The experimental results showed that the new method gave a better performance than the original method. Furthermore, we propose the improved method which use a tree-structure of a similar phoneme as the substitute for the phoneme which does not appear in the adaptation data. From the experimental results, the improved method gave a better performance than the method previously proposed.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115383236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Toward on-line learning of Chinese continuous speech recognition system 汉语连续语音识别系统的在线学习研究
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-748
Rong Zheng, Zuoying Wang
{"title":"Toward on-line learning of Chinese continuous speech recognition system","authors":"Rong Zheng, Zuoying Wang","doi":"10.21437/ICSLP.1998-748","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-748","url":null,"abstract":"In this paper, we presented an integrated on-line learning scheme, which combined the state-of-art speaker normalization and adaptation techniques to improve the performance of our large vocabulary Chinese continuous speech recognition (CSR)system. We used VTLN to remove inter-speaker variation in both training and testing stage. To facilitate dynamic transformation scale determination, we devised a tree-based transformation method as the key component of our incrementaladaptation. Experiments shows that the combined scheme of on-line learning (incremental & unsupervised) system, which gives approximately 22~26% error reduction rate, was proved to be better than either method when used separately at and 2.7 . .","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115413061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised training of a speech recognizer using TV broadcasts 使用电视广播对语音识别器进行无监督训练
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-632
T. Kemp, A. Waibel
{"title":"Unsupervised training of a speech recognizer using TV broadcasts","authors":"T. Kemp, A. Waibel","doi":"10.21437/ICSLP.1998-632","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-632","url":null,"abstract":"Current speech recognition systems require large amounts of transcribed data for parameter estimation. The transcription, however, is tedious and expensive. In this work we describe our experiments which are aimed at training a speech recognizer without transcriptions. The experiments were carried out with TV newscasts, that were recorded using a satellite receiver and a simple MPEG coding hardware. The newscasts were automatically segmented into segments of similar acoustic background condition. This material is inexpensive and can be made available in large quantities, but there are no transcriptions available. We develop a training scheme, where a recognizer is boot-strapped using very little transcribed data and is improved using new, untranscribed speech. We show that it is neces-sary to use a con(cid:12)dence measure to judge the initial transcriptions of the recognizer before using them. Higher im-provements can be achieved if the number of parameters in the system is increased when more data becomes available. We show, that the bene(cid:12)cial e(cid:11)ect of unsupervised training is not compensated by MLLR adaptation on the hypothesis. In a (cid:12)nal experiment, the e(cid:11)ect of untranscribed data is compared with the e(cid:11)ect of transcribed speech. Using the described methods, we found that the untranscribed data gives roughly one third of the improvement of the transcribed material.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115497599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
SIVHA, visual speech synthesis system 视觉语音合成系统
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-777
Y. Blanco, Maria Cuellar, A. Villanueva, Fernando Lacunza, R. Cabeza, B. Marcotegui
{"title":"SIVHA, visual speech synthesis system","authors":"Y. Blanco, Maria Cuellar, A. Villanueva, Fernando Lacunza, R. Cabeza, B. Marcotegui","doi":"10.21437/ICSLP.1998-777","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-777","url":null,"abstract":"This paper presents SIVHA, a high quality Spanish speech synthesis system for severe disabled persons controlled by their eye movements. The system follows the eye-gaze of the patients along the screen and constructs the text with the selected words. When the user considers that the construction of the message has been finished, the synthesis of the message can be ordered. The system is divided in three modules. The first one determines the point of the screen the user is looking at, the second one is an interface to construct the sentences and the third one is the synthesis itself.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"315 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124464123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Convergence of fundamental frequencies in conversation: if it happens, does it matter? 对话中基本频率的收敛:如果发生了,有什么关系吗?
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-111
Belinda Collins
{"title":"Convergence of fundamental frequencies in conversation: if it happens, does it matter?","authors":"Belinda Collins","doi":"10.21437/ICSLP.1998-111","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-111","url":null,"abstract":"This paper explores the existence and nature of accommodation processes within conversation, particularly convergence of fundamental frequency (Fo) of conversational participants over time. The study raises a number of issues related to methodologies for analysing interactional (typically conversational) data. Most important is the issue of the applicability of statistical sampling methods which are independent of the interactional events occurring within the talk. It concludes with suggestions for a methodology that examines long term acoustic phenomena (eg long term fundamental frequency) and relates events at the micro acoustic level to interactional events within a conversation.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116635938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Multi-Span statistical language modeling for large vocabulary speech recognition 面向大词汇量语音识别的多跨度统计语言建模
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-640
J. Bellegarda
{"title":"Multi-Span statistical language modeling for large vocabulary speech recognition","authors":"J. Bellegarda","doi":"10.21437/ICSLP.1998-640","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-640","url":null,"abstract":"The goal of multi-span language modeling is to integrate the various constraints, both local and global, that are present in the language. In this paper, local constraints are captured via the usual n-gram approach, while global constraints are taken into account through the use of latent semantic analysis. Anintegrative formulation is derivedfor the combination of these two paradigms, resulting in an en-tirely data-driven, multi-span framework for large vocabulary speech recognition. Because of the inherent comple-mentarity in the two types of constraints, the performance of the integrated language model compares favorably with the corresponding n-gram performance. Both perplexity and average word error rate (cid:12)gures are reported and dis-cussed.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117277240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
A computational algorithm for F0 contour generation in Korean developed with prosodically labeled databases using k-toBI system 利用k-toBI系统开发了一种基于节奏标记数据库的韩文F0轮廓生成计算算法
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-34
Yong-Ju Lee, Sook-Hyang Lee, Jong-Jin Kim, Hyun-Ju Ko, Young-Il Kim, Sanghun Kim, Jung-Cheol Lee
{"title":"A computational algorithm for F0 contour generation in Korean developed with prosodically labeled databases using k-toBI system","authors":"Yong-Ju Lee, Sook-Hyang Lee, Jong-Jin Kim, Hyun-Ju Ko, Young-Il Kim, Sanghun Kim, Jung-Cheol Lee","doi":"10.21437/ICSLP.1998-34","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-34","url":null,"abstract":"This study describes an algorithm for the F0 contour generation system for Korean sentences and its evaluation results. 400 K-ToBI labeled utterances were used which were read by one male and one female announcers. F0 contour generation system uses two classification trees for prediction of K-ToBI labels for input text and 11 regression trees for prediction of F0 values for the labels. Evaluation results of the system showed 77.2% prediction accuracy for prediction of IP boundaries and 72.0% prediction accuracy for AP boundaries. Information of voicing and duration of the segments was not changed for F0 contour generation and its evaluation. Evaluation results showed 23.5Hz RMS error and 0.55 correlation coefficient in F0 generation experiment using labelling information from the original speech data.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121338119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
The effect of fundamental frequency on Mandarin speech recognition 基频对普通话语音识别的影响
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-761
Sharlene A. Liu, S. Doyle, Allen Morris, Farzad Ehsani
{"title":"The effect of fundamental frequency on Mandarin speech recognition","authors":"Sharlene A. Liu, S. Doyle, Allen Morris, Farzad Ehsani","doi":"10.21437/ICSLP.1998-761","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-761","url":null,"abstract":"We study the effects of modeling tone in Mandarin speech recognition. Including the neutral tone, there are 5 tones in Mandarin and these tones are syllable-level phenomena. A direct acoustic manifestation of tone is the fundamental frequency (f0). We will report on the effect of f0 on the acoustic recognition accuracy of a Mandarin recognizer. In particular, we put f0, its first derivative (f0 ¢ ), and its second derivative (f0 ¢¢ ) in separate streams of the feature vector. Stream weights are adjusted to investigate the individual effects of f0, f0 ¢ , and f0 ¢¢ to recognition accuracy. Our results show that incorporating the f0 feature negatively impacted accuracy, whereas f0’ increased accuracy and f0’’ seemed to have no effect.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127252872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
ToBI accent type recognition ToBI重音类型识别
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-126
Arman Maghbouleh
{"title":"ToBI accent type recognition","authors":"Arman Maghbouleh","doi":"10.21437/ICSLP.1998-126","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-126","url":null,"abstract":"This paper describes work in progress for recognizing a subset of ToBI intonation labels (H*, L+H*, L*, !H*, L+!H*, no accent). Initially, duration characteristics are used to classify syllables as accented or not. The accented syllables are then subclassified based on fundamental frequency, F0, values. Potential F0 intonation gestures are schematized by connected line segments within a window around a given syllable. The schematizations are found using spline-basis linear regression. The regression weights on F0 points are varied in order to discount segmental effects and F0 detection errors. Parameters based on the line segments are then used to perform the subclassification. This paper presents new results in recognizing L*, L+H*, and L+!H* accents. In addition, the models presented here perform comparably (80% overall, and 74% accent type recognition) to models which do not distinguish bitonal accents.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127496978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信