5th International Conference on Spoken Language Processing (ICSLP 1998)最新文献

筛选
英文 中文
Parametric trajectory mixtures for LVCSR LVCSR的参数轨迹混合
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-685
M. Siu, R. Iyer, H. Gish, Carl Quillen
{"title":"Parametric trajectory mixtures for LVCSR","authors":"M. Siu, R. Iyer, H. Gish, Carl Quillen","doi":"10.21437/ICSLP.1998-685","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-685","url":null,"abstract":"Parametric trajectory models explicitly represent the temporal evolution of the speech features as a Gaussian process with time-varying parameters. HMMs are a special case of such models, one in which the trajectory constraints in the speech segment are ignored by the assumption of conditional independence across frames within the segment. In this paper, we investigate in detail some extensions to our trajectory modeling approach aimed at improving LVCSR performance: (i) improved modeling of mixtures of trajectories via better initialization, (ii) modeling of context dependence, and (iii) improved segment boundaries by means of search. We will present results in terms of both phone classi cation and recognition accuracy on the Switchboard corpus.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130186963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Speech communication profiles across the adult lifespan: persons without self-identified hearing impairment 成年期的语言交流概况:没有自我认定的听力障碍的人
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-790
M. Cheesman, K. Smilsky, T. Major, F. Lewis, L. Boorman
{"title":"Speech communication profiles across the adult lifespan: persons without self-identified hearing impairment","authors":"M. Cheesman, K. Smilsky, T. Major, F. Lewis, L. Boorman","doi":"10.21437/ICSLP.1998-790","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-790","url":null,"abstract":"A sample of 209 adults ranging from 20 to 79 years of age were studied to measure speech communication profiles as a function of age in persons who did not identify themselves as hearing impaired. The study was conducted in order to evaluate age-related speech percepton abilities and ccammmication profiles in a population who do not present for hearing assessment and who are not included in census statistics as having hearing problems. Audiometric assessment, demographic and hearing history self-reports, speech reception thresholds, consonant discrimination perception in quiet and noise, and the Ccumnunication Profile for the Hearing Impaired (CPHI) were the in.ements used to develop speech communication profiles. Hearing performance decreased with increased age. However, despite self-reports of no hearing impairment, many subjects over age 50 had audiometric thresholds that indicated hearing impairment. The responses to the CPHI were correlated to audiometric thresholds, but also to the age of the respondent, when hearing thresholds had been controlled statistically. A comparison of CPHI responses f?om this study and that of two other samples in clinical populations revealed only slightly different patterns of behaviour in the present sample when co&o&d with communication difficulties.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133893507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A minimax search algorithm for CDHMM based robust continuous speech recognition 基于CDHMM的鲁棒连续语音识别的极大极小搜索算法
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-304
Hui Jiang, K. Hirose, Qiang Huo
{"title":"A minimax search algorithm for CDHMM based robust continuous speech recognition","authors":"Hui Jiang, K. Hirose, Qiang Huo","doi":"10.21437/ICSLP.1998-304","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-304","url":null,"abstract":"In this paper, we propose a novel implementation of a minimax decision rule for continuous density hidden Markov model based robust speech recognition. By combining the idea of the minimax decision rule with a normal Viterbi search, we derive a recursive minimax search algorithm, where the minimax decision rule is repetitively applied to determine the partial paths during the search procedure. Because of its intrinsic nature of a recursive search, the proposed method can be easily extended to perform contin-uos speech recognition. Experimental results on Japanese isolated digits and TIDIGITS, where the mismatch between training and testing conditions is caused by additive white Gaussian noise, show the viability and e(cid:14)ciency of the proposed minimax search algorithm.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134135137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Nozomi - a fast, memory-efficient stack decoder for LVCSR Nozomi -一个快速,内存高效的LVCSR堆栈解码器
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-627
M. Schuster
{"title":"Nozomi - a fast, memory-efficient stack decoder for LVCSR","authors":"M. Schuster","doi":"10.21437/ICSLP.1998-627","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-627","url":null,"abstract":"This paper describes some of the implementation details of the Nozomi\" 1 stack decoder for LVCSR. The decoder was tested on a Japanese Newspaper Dictation Task using a 5000 word vocabulary. Using continuous density acoustic models with 2000 and 3000 states trained on the JNAS/ASJ corpora and a 3-gram LM trained on the RWC text corpus, both models provided by the IPA group [7], it was possible to reach more than 95% word accuracy on the standard test set. With computationally cheap acoustic models we could achieve around 89% accuracy in nearly realtime on a 300 Mhz Pentium II. Using a disk-based LM the memory usage could be optimized to 4 MB in total.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133904476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Evidence of dual-route phonetic encoding from apraxia of speech: implications for phonetic encoding models 语音失用双路径语音编码的证据:语音编码模型的意义
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-789
R. Varley, S. Whiteside
{"title":"Evidence of dual-route phonetic encoding from apraxia of speech: implications for phonetic encoding models","authors":"R. Varley, S. Whiteside","doi":"10.21437/ICSLP.1998-789","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-789","url":null,"abstract":"Contemporary psycholinguistic models suggest that there may be dual routes operating in phonetic encoding: a direct route which uses stored syllabic units, and an indirect route which relies on the on-line assembly of sub-syllabic units. The more computationally efficient direct route is more likely to be used for high frequency words, while the indirect route is most likely to be used for novel or low frequency words. We suggest that the acquired neurological disorder of apraxia of speech (AOS), provides a window to speech encoding mechanisms and that the disorder represents an impairment of direct route encoding mechanisms and, therefore, a reliance on indirect mechanisms. We report an investigation of the production of high and low frequency words across three subject groups: non-brain damaged control (NBDC, N=3); brain damaged control (BDC, N=3) and speakers with AOS (N=4). The results are presented and discussed within the dual-route phonetic encoding hypothesis.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"167 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133996680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An analysis of the timing of turn-taking in a corpus of goal-oriented dialogue 目标导向对话语料库中轮换时间的分析
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-81
Matthew Bull, M. Aylett
{"title":"An analysis of the timing of turn-taking in a corpus of goal-oriented dialogue","authors":"Matthew Bull, M. Aylett","doi":"10.21437/ICSLP.1998-81","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-81","url":null,"abstract":"This paper presents a context-based analysis of the intervals between different speakers’ utterances in a corpus of taskoriented dialogue (the Human Communication Research Centre’s Map Task Corpus. See Anderson et al. 1991). In the analysis, we assessed the relationship between inter-speaker intervals and various contextual factors, such as the effects of eye contact, the presence of conversational game boundaries, the category of move in an utterance, and the degree of experience with the task in hand. The results of the analysis indicated that the main factors which gave rise to significant differences in inter-speaker intervals were those which related to decision-making and planning the greater the amount of planning, the greater the inter-speaker interval. Differences between speakers were also found to be significant, although this effect did not necessarily interact with all other effects. These results provide unique and useful data for the improved effectiveness of dialogue systems.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131553624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Robust automatic speech recognition by the application of a temporal-correlation-based recurrent multilayer neural network to the mel-based cepstral coefficients 应用基于时间相关的递归多层神经网络对基于mel的倒谱系数进行鲁棒自动语音识别
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-328
M. Héon, H. Tolba, D. O'Shaughnessy
{"title":"Robust automatic speech recognition by the application of a temporal-correlation-based recurrent multilayer neural network to the mel-based cepstral coefficients","authors":"M. Héon, H. Tolba, D. O'Shaughnessy","doi":"10.21437/ICSLP.1998-328","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-328","url":null,"abstract":"In this paper, the problem of robust speech recognition has been considered. Our approach is based on the noise reduction of the parameters that we use for recognition, that is, the Mel-based cepstral coefficients. A Temporal-Correlation-Based Recurrent Multilayer Neural Network (TCRMNN) for noise reduction in the cepstral domain is used in order to get less-variant parameters to be useful for robust recognition in noisy environments. Experiments show that the use of the enhanced parameters using such an approach increases the recognition rate of the continuous speech recognition (CSR) process. The HTK Hidden Markov Model Toolkit was used throughout. Experiments were done on a noisy version of the TIMIT database. With such a pre-processing noise reduction technique in the front-end of the HTK-based continuous speech recognition system (CSR) system, improvements in the recognition accuracy of about 17.77% and 18.58% using single mixture monophones and triphones, respectively, have been obtained at a moderate SNR of 20 dB.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131757815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Improving speaker recognisability in phonetic vocoders 提高语音声码器中说话人的可识别性
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-396
C. Ribeiro, I. Trancoso
{"title":"Improving speaker recognisability in phonetic vocoders","authors":"C. Ribeiro, I. Trancoso","doi":"10.21437/ICSLP.1998-396","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-396","url":null,"abstract":"Phonetic vocoding is one of the methods for coding speech below 1000 bit/s. The transmitter stage includes a phone recogniser whose index is transmitted together with prosodic information such as duration, energy and pitch variation. This type of coder does not transmit spectral speaker characteristics and speaker recognisability thus becomes a major problem. In our previous work, we adapted a speaker modification strategy to minimise this problem, modifying a codebook to match the spectral characteristics of the input speaker. This is done at the cost of transmitting the LSP averages computed for vowel and glide phones. This paper presents new codebook generation strategies, with gender dependence and interpolation frames, that lead to better speaker recognisability and speech quality. Relatively to our previous work, some effort was also devoted to deriving more efficient quantization methods for the speakerspecific information, that considerably reduced the average bit rate, without quality degradation. For the CD-ROM version, a set of audio files is also included.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131802616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Prosodic vs. segmental contributions to naturalness in a diphone synthesizer 音韵与音段对双管合成器自然性的贡献
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-15
H. Bunnell, S. Hoskins, Debra Yarrington
{"title":"Prosodic vs. segmental contributions to naturalness in a diphone synthesizer","authors":"H. Bunnell, S. Hoskins, Debra Yarrington","doi":"10.21437/ICSLP.1998-15","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-15","url":null,"abstract":"The relative contributions of segmental versus prosodic factors to the perceived naturalness of synthetic speech was measured by transplanting prosody between natural speech and the output of a diphone synthesizer. A small corpus was created containing matched sentence pairs wherein one member of the pair was a natural utterance and the other was a synthetic utterance generated with diphone data from the same talker. Two additional sentences were formed from each sentence pair by transplanting the prosodic structure between the natural and synthetic members of each pair. In two listening experiments subjects were asked to (a) classify each sentence as “natural” or “synthetic, or (b) rate the naturalness of each sentence. Results showed that the prosodic information was more important than segmental information in both classification and ratings of naturalness.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132637230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Perceptual properties of Russians with Japanese fricatives 俄语日语摩擦音的感知特性
5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-719
S. Funatsu, S. Kiritani
{"title":"Perceptual properties of Russians with Japanese fricatives","authors":"S. Funatsu, S. Kiritani","doi":"10.21437/ICSLP.1998-719","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-719","url":null,"abstract":"This study investigated the perceptual properties of second language learners in acquiring second language phonemes. The case where the relation between two phonemes of a second language and those of a native language changes according to following vowels was studied. The perceptual properties of Russians with regards to Japanese fricatives were examined. In the perception test, the confusion of [ (cid:219) o] with [so] was very large. This phenomenon could be caused by the difference between the transition onset time from [s’] to vowels and that from the other consonants to vowels. It is considered that, in the case of following vowel [a] and [o], Russians equated Japanese [s] and [ (cid:219) ] with Russian [s] and [s'] respectively. However, in the case of [u], they did not equate them in such a manner. This is probably because the acoustic properties of Japanese [ (cid:149) ] are very different from those of Russian [u].","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"17 3 Pt 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130780388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信