5th International Conference on Spoken Language Processing (ICSLP 1998)最新文献

筛选
英文 中文
Acoustic indicators of topic segmentation 主题分割的声学指标
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-582
Julia Hirschberg, C. H. Nakatani
{"title":"Acoustic indicators of topic segmentation","authors":"Julia Hirschberg, C. H. Nakatani","doi":"10.21437/ICSLP.1998-582","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-582","url":null,"abstract":"The segmentation of text and speech into topics and subtopics is an important step in document interpretation. For text, formatting information, such as headings and paragraphing, is available to aid in this endeavor, although this information is by no means su cient. For speech, the task is even more di cult. We present results of the application of machine learning techniques to the automatic identi cation of intonational phrases beginning and ending 'topics' determined independently by annotators for two corpora | the Boston Directions Corpus and the Broadcast News (HUB-4) DARPA/NIST database.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128117617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 88
Dealing with out-of-vocabulary words and speech disfluencies in an n-gram based speech understanding system 基于n-gram的语音理解系统中词汇外词和语音不流畅的处理
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-648
A. Kai, Y. Hirose, S. Nakagawa
{"title":"Dealing with out-of-vocabulary words and speech disfluencies in an n-gram based speech understanding system","authors":"A. Kai, Y. Hirose, S. Nakagawa","doi":"10.21437/ICSLP.1998-648","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-648","url":null,"abstract":"In this study, we investigate the e(cid:11)ectiveness of an unknown word processing(UWP) algorithm, which is incorporated into an N-gram language model based speech recognition system for dealing with (cid:12)lled pauses and out- of-vocabulary(OOV) words. We have already been investigated the e(cid:11)ect of the UWP algorithm, which utilizes a simple subword sequence decoder, in a spoken dialog sys- tem using a context free grammar(CFG) as a language model. The e(cid:11)ect of the UWP algorithm was investigated using an N-based continuous speech recognition system on both a small dialog task and a large-vocabulary read speech dictation task. The experiment results showed that the UWP improves the recognition accuracy and an N-gram based system with the UWP can improve the understanding performance in compared with a CFG-based system.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125718849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Context dependent tree based transforms for phonetic speech recognition 基于上下文相关树的语音识别变换
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-645
Bernard Doherty, S. Vaseghi, P. McCourt
{"title":"Context dependent tree based transforms for phonetic speech recognition","authors":"Bernard Doherty, S. Vaseghi, P. McCourt","doi":"10.21437/ICSLP.1998-645","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-645","url":null,"abstract":"This paper presents a novel method for modeling phonetic context using linear context transforms. Initial investigations have shown the feasibility of synthesising context dependent models from context independent models through weighted interpolation of the peripheral states of a given hidden markov model with its adjacent model. This idea can be further extended, to maximum likelihood estimation of not only single weights, but a matrix of weights or a transform. This paper outlines the application of Maximum Likelihood Linear Regression (MLLR) as a means of modeling context dependency in continuous density Hidden Markov Models (HMM).","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127910160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
What spreads, and how? tonal rightward spreading on shanghai disyllabic compounds 什么会传播,如何传播?上海双音节复合词声调向右扩散
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-145
X. Zhu
{"title":"What spreads, and how? tonal rightward spreading on shanghai disyllabic compounds","authors":"X. Zhu","doi":"10.21437/ICSLP.1998-145","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-145","url":null,"abstract":"The present paper examines what kinds of Shanghai disyllabic lexical tone sandhi undergoes, especially in what sense and to what extent a disyllabic tone can be claimed to result from rightward spreading of the corresponding citation tone. It will be shown that F0 spreading occurs in the Long tone domains while Contour element spreading mainly in the Short tone domains.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128192571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phonetic alignment: speech synthesis based vs. hybrid HMM/ANN 语音对齐:基于语音合成的vs.混合HMM/ANN
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-595
F. Malfrère, O. Deroo, T. Dutoit
{"title":"Phonetic alignment: speech synthesis based vs. hybrid HMM/ANN","authors":"F. Malfrère, O. Deroo, T. Dutoit","doi":"10.21437/ICSLP.1998-595","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-595","url":null,"abstract":"In this paper we compare two different methods for phonetically labeling a speech database. The first approach is based on the alignment of the speech signal on a high quality synthetic speech pattern, and the second one uses a hybrid HMM/ANN system. Both systems have been evaluated on French read utterances from a speaker never seen in the training stage of the HMM/ANN system and manually segmented. This study outlines the advantages and drawbacks of both methods. The high quality speech synthetic system has the great advantage that no training stage is needed, while the classical HMM/ANN system easily allows multiple phonetic transcriptions. We deduce a method for the automatic constitution of phonetically labeled speech databases based on using the synthetic speech segmentation tool to bootstrap the training process of our hybrid HMM/ANN system. The importance of such segmentation tools will be a key point for the development of improved speech synthesis and recognition systems.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115831748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
The importance of the first syllable in English spoken word recognition by adult Japanese speakers 第一个音节在成人日语口语单词识别中的重要性
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-764
Kazuo Nakayama, Kaoru Tomita-Nakayama
{"title":"The importance of the first syllable in English spoken word recognition by adult Japanese speakers","authors":"Kazuo Nakayama, Kaoru Tomita-Nakayama","doi":"10.21437/ICSLP.1998-764","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-764","url":null,"abstract":"We investigated adult Japanese speakers’ deficiencies in English spoken word recognition. We found that the accurate recognition of the first syllable or the initial portion of each word played an important role in recognizing a word correctly. It was implied in the study that their recognition performance would be enhanced by utilizing the speech processing methods, time-scale expansion and/or dynamic range compression. Although approximately 85 percent of English words begin with strong syllables [1], many of them do not carry a sentence stress and they are not pronounced as clearly as isolated words. Moreover, the duration of a word, especially a beginning word is so short that the listener can't recognize it correctly. Two experiments were administered in the anechoic room. In the first experiment, subjects listened to extracted words and corresponding isolated words of English, which included words without primary stress on the first syllables. We found that they had difficulty in recognizing both isolated words and the extracted words, especially when the word did not begin with a strong syllable, which was sounded somewhat unclear. This is quite frequent in a normal English speech. We confirmed that they had difficulty recognizing the words which began with weak syllables and it is concluded that the first syllable plays an important role in the recognition of words at least for Japanese speakers. In the second experiment, the extracted words and the corresponding time-scale expanded words (henceforth, expanded words) were given. The result indicated that the expanded words were better recognized. It is found that the time-scale modification (henceforth, TSM) of the extracted words didn’t lose intelligibility even around the ratio of 2.00, as was clear from the fact that the recognition improved.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115896967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computer-mediated input and the acquisition of L2 vowels 计算机媒介输入与第二语言元音的习得
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-844
M. Fox
{"title":"Computer-mediated input and the acquisition of L2 vowels","authors":"M. Fox","doi":"10.21437/ICSLP.1998-844","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-844","url":null,"abstract":"Programs for testing and training of difficult vowel distinctions in American English were created for subjects to access via the Internet using a web browser. The testing and training data include many likely vowel confusions for speakers of different L1s. The training program focuses on one distinction at a time, and adjusts to concentrate on particular contexts or exemplars that are difficult for the individual subject. In the current study, 52 subjects participated in testing and 2 subjects participated in training. In the testing portion, results indicate that the L1 and the fluency level in English, as well as individual variability, have an effect on perceptual ability. In the training portion, subjects showed significant improvement on the contrasts on which they trained. Because these programs make extensive data collection over large populations and large distances easy, this method of research will facilitate further investigation of questions regarding second language acquisition.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132051206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A very low bit rate speech coder using HMM with speaker adaptation 一个非常低比特率的语音编码器使用HMM与说话人自适应
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-375
T. Masuko, K. Tokuda, Takao Kobayashi
{"title":"A very low bit rate speech coder using HMM with speaker adaptation","authors":"T. Masuko, K. Tokuda, Takao Kobayashi","doi":"10.21437/ICSLP.1998-375","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-375","url":null,"abstract":"This paper describes a speaker adaptation technique for a phonetic vocoder based on HMM. In the vocoder, the encoder performs phoneme recognition and transmits phoneme indexes and state durations to the decoder, and the decoder synthesizes speech using HMM-based speech synthesis technique. One of the main problems of this vocoder is that the voice characteristics of synthetic speech depend on HMMs used in the decoder, and are therefore fixed regardless of a variety of input speakers. To overcome this problem, we adapt HMMs to input speech by transmitting transfer vectors, information on mismatch between the input speech and HMMs. The results of the subjective tests show that the performance of the proposed vocoder without quantization of transfer vectors is comparable to that of a speaker dependent vocoder.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132208532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A four layer sharing HMM system for very large vocabulary isolated word recognition 一个四层共享HMM系统,用于非常大词汇量的孤立词识别
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-284
Ruxin Chen, Miyuki Tanaka, Duanpei Wu, L. Olorenshaw, Mariscela Amador
{"title":"A four layer sharing HMM system for very large vocabulary isolated word recognition","authors":"Ruxin Chen, Miyuki Tanaka, Duanpei Wu, L. Olorenshaw, Mariscela Amador","doi":"10.21437/ICSLP.1998-284","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-284","url":null,"abstract":"This paper reports on a large vocabulary speaker independent isolated word recognizer targeting 50,000 words. The system supports a unique four-layer sharing structure for either continuous HMM or discrete HMM. Evaluation is performed using a dictionary of 5000 US city names, a dictionary of the 5000 English most frequent words, a dictionary of 50,000 English words, and the 110,000 word CMU English dictionary. For these dictionaries, recognition accuracy ranges from 90% to 93% for the top 3 results.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132355017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
An efficient mel-LPC analysis method for speech recognition 语音识别中一种高效的mel-LPC分析方法
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-536
H. Matsumoto, Y. Nakatoh, Y. Furuhata
{"title":"An efficient mel-LPC analysis method for speech recognition","authors":"H. Matsumoto, Y. Nakatoh, Y. Furuhata","doi":"10.21437/ICSLP.1998-536","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-536","url":null,"abstract":"This paper proposes a simple and e(cid:14)cient time domain technique to estimate an all-poll model on a mel-frequency axis (Mel-LPC). This method requires only two-fold computational cost as compared to conventional linear prediction analysis. The recognition performance of mel-cepstral parameters obtained by the Mel LPC analysis is compared with those of conventional LP mel-cepstra and the mel-frequency cepstrum coe(cid:14)cients (MFCC) through gender-dependent phoneme and word recognition tests. The results show that the Mel-LPC cepstrum attains a signi(cid:12)cant improvement in recognition accuracy over conventional LP mel-cepstrum, and gives slightly higher accuracy for male speakersand slightlylower accuracy for female speakersthan MFCC.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"239 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132403918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信