{"title":"F0 declination in read-aloud and spontaneous speech","authors":"M. Swerts, E. Strangert, M. Heldner","doi":"10.21437/ICSLP.1996-387","DOIUrl":"https://doi.org/10.21437/ICSLP.1996-387","url":null,"abstract":"This paper deals with a prosodic comparison of spontaneous and read-aloud speech. More speci(cid:12)cally, the study reports data on F0 declination in these two speaking modes using Swedish materials. For both speaking styles the analysis revealed negative slopes, a steepness-duration dependency with declination being less steep in longer utterances than in shorter ones and resetting at utterance boundaries. However, there was a di(cid:11)erence in degree of declination between the two speaking styles, read-aloud speech in general having steeper slopes, a more apparent time-dependency and stronger resetting than spontaneous speech.","PeriodicalId":90685,"journal":{"name":"Proceedings : ICSLP. International Conference on Spoken Language Processing","volume":"68 1","pages":"1501-1504"},"PeriodicalIF":0.0,"publicationDate":"1996-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79855866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling segmental duration in German text-to-speech synthesis","authors":"Bernd Möbius, J. V. Santen","doi":"10.21437/ICSLP.1996-601","DOIUrl":"https://doi.org/10.21437/ICSLP.1996-601","url":null,"abstract":"The paper reports on the construction of a model for segmental duration in German. The model predicts the durations of speech sounds in various textual, prosodic, and segmental contexts. It has been implemented in the German version of the Bell Labs text to speech system (R. Sproat and J. Olive, 1995; B. Mobius et al., 1996). The construction of the duration system was made efficient by the use of an interactive statistical analysis package that incorporates the approach outlined by J.P.H. van Santen (1994). The results an stored in tables in a format that can be directly interpreted by the TTS duration module. Tables are constructed in two phases: inferential statistical analysis of the speech corpus, and parameter estimation. The overall correlation between observed and predicted segmental durations is .896.","PeriodicalId":90685,"journal":{"name":"Proceedings : ICSLP. International Conference on Spoken Language Processing","volume":"22 1","pages":"2395-2398"},"PeriodicalIF":0.0,"publicationDate":"1996-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80020040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distinctions between [t] and [tch] using electropalatography data","authors":"S. Mair, C. Scully, C. Shadle","doi":"10.21437/ICSLP.1996-410","DOIUrl":"https://doi.org/10.21437/ICSLP.1996-410","url":null,"abstract":"","PeriodicalId":90685,"journal":{"name":"Proceedings : ICSLP. International Conference on Spoken Language Processing","volume":"1 1","pages":"1597-1600"},"PeriodicalIF":0.0,"publicationDate":"1996-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90298743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effects of auditory feedback on F0 trajectory generation","authors":"Hideki Kawahara, H. Kato, J. C. Williams","doi":"10.21437/ICSLP.1996-97","DOIUrl":"https://doi.org/10.21437/ICSLP.1996-97","url":null,"abstract":"In this paper, a method is proposed to evaluate contributions of auditory feedback to speech F0 trajectory generation. This method is based on data obtained in a series of new auditory feedback experiments (TAF: transformed auditory feedback) in which quantitative measurements were taken of interactions between speech perception and production under natural speech conditions. Experimental results revealed that the effects on power spectra of F0 trajectories vary among subjects and that the maximum magnitude exceeds 10 dB.","PeriodicalId":90685,"journal":{"name":"Proceedings : ICSLP. International Conference on Spoken Language Processing","volume":"30 1","pages":"287-290"},"PeriodicalIF":0.0,"publicationDate":"1996-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79116830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Speaker adaptation by modeling the speaker variation in a continuous speech recognition system","authors":"Nikko Ström","doi":"10.21437/ICSLP.1996-249","DOIUrl":"https://doi.org/10.21437/ICSLP.1996-249","url":null,"abstract":"A method for unsupervised instantaneous speaker adaptation is presented and evaluated on a continuous speech recognition task in a man-machine dialogue system. The method is based on modeling of the systematic speaker variation. The variation is modeled by a low-dimensional speaker space and the classification of speech segments is conditioned by the position in the speaker space. Because the effect of the speaker space position on the classification is determined in an off-line training procedure using the speakers in a training database, complex systematic speaker variation can be modeled. Speaker adaptation is achieved only by the constraint that the position in the speaker space is constant over each utterance. Therefore, no separate adaptation session is needed and the adaptation is present from the first utterance. Consequently, for a user there is no noticeable difference between this system and a speaker-independent system. The speaker model and the phonetic classification are implemented in the ANN part of a hybrid ANN/HMM system. In experiments with a pilot system, word accuracy is improved for utterances longer than three words and utterance level results are improved for utterances of all lengths.","PeriodicalId":90685,"journal":{"name":"Proceedings : ICSLP. International Conference on Spoken Language Processing","volume":"15 1","pages":"989-992"},"PeriodicalIF":0.0,"publicationDate":"1996-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87512032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic control of a production model","authors":"L. Candille, H. Meloni","doi":"10.21437/ICSLP.1996-583","DOIUrl":"https://doi.org/10.21437/ICSLP.1996-583","url":null,"abstract":"A number of experiments have shown that it is possible to use production models for speech recognition tasks [6] and [2]. We present here the first results of an adaptation of Maeda's statistic model. We have also demonstrated the importance of taking into account the static and dynamic characteristics of the speaker. Some preliminary results for the identification of V1-V2 sequences are also provided.","PeriodicalId":90685,"journal":{"name":"Proceedings : ICSLP. International Conference on Spoken Language Processing","volume":"191 1","pages":"2305-2308"},"PeriodicalIF":0.0,"publicationDate":"1996-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72751896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. D. L. Torre, Francisco Javier Caminero Gil, J. Alvarez-Cercadillo, C. M. D. Alamo, L. A. H. Gómez
{"title":"Evaluation of the telef nica i+d natural numbers recognizer over different dialects of Spanish from Spain and America","authors":"C. D. L. Torre, Francisco Javier Caminero Gil, J. Alvarez-Cercadillo, C. M. D. Alamo, L. A. H. Gómez","doi":"10.21437/ICSLP.1996-515","DOIUrl":"https://doi.org/10.21437/ICSLP.1996-515","url":null,"abstract":"Presents the results obtained when evaluating the Natural Numbers Recognizer of Telefonica Investigacion y Desarrollo (I+D) over some particular dialects of Spanish from Spain and America. The evaluation was made over two different data sets, corresponding to two different situations. The first set includes dialects of Spanish from Spain that were considered in the training and design of our baseline system, and the second set corresponds to Argentinian Spanish, which was not considered in the training of the original system. Because we are interested in a system that can be used by a wide range of users, we tested the possibilities of MAP (maximum a-priori) techniques to adapt the original HMMs in order to represent all the dialects. The experimental results show the capabilities of our recognizer for use in applications spread over a great number of Spanish-speaking countries.","PeriodicalId":90685,"journal":{"name":"Proceedings : ICSLP. International Conference on Spoken Language Processing","volume":"26 1","pages":"2032-2035"},"PeriodicalIF":0.0,"publicationDate":"1996-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76602331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-lingual phoneme recognition exploiting acoustic-phonetic similarities of sounds","authors":"J. Köhler","doi":"10.21437/ICSLP.1996-556","DOIUrl":"https://doi.org/10.21437/ICSLP.1996-556","url":null,"abstract":"The aim of the work is to exploit the acoustic-phonetic similarities between several languages. In recent work cross-language HMM-based phoneme models have been used only for bootstrapping the language-dependent models and the multi-lingual approach has been investigated only on very small speech corpora. The author introduces a statistical distance measure to determine the similarities of sounds. Further, he presents a new technique to model multi-lingual phonemes. The experiments are conducted with the OGI Multi-Language Telephone Speech Corpus for the languages American English, German and Spanish. In the first experiment phoneme recognition rates between 39.0% and 53.9% are achieved using language-dependent models. Using cross-language models yields improvement for some phonemes, but on average a degradation of recognition performance is observed. However, cross-language models speeds up the cross-language transfer and reduce the size of the phoneme inventory of multi-lingual speech recognition systems. Finally, a new method of modelling multi-lingual phonemes, which can be used for a variety of languages, is presented. This technique reduces the number of phoneme-based units in a multi-lingual speech recognition system.","PeriodicalId":90685,"journal":{"name":"Proceedings : ICSLP. International Conference on Spoken Language Processing","volume":"3 1","pages":"2195-2198"},"PeriodicalIF":0.0,"publicationDate":"1996-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85122445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clinical applications of computer-based speech training for children with hearing impairment","authors":"Anne-Marie Öster","doi":"10.21437/ICSLP.1996-40","DOIUrl":"https://doi.org/10.21437/ICSLP.1996-40","url":null,"abstract":"Computer-based visual speech training has become widely used both within medical and pedagogical rehabilitation in Sweden. When learning speech motor ability, clear instruction is of great importance in order for the learner to realise what is deviant and what is correct in his or her pattern of behaviour. The correct behaviour should then be established and automated through extensive training for it to be transferred to untrained situations. This paper discusses teaching strategies and reports positive training results obtained through the visually contrastive feedback that this modern technical speech training aid offers.","PeriodicalId":90685,"journal":{"name":"Proceedings : ICSLP. International Conference on Spoken Language Processing","volume":"9 1","pages":"157-160"},"PeriodicalIF":0.0,"publicationDate":"1996-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88339268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Alvarez-Cercadillo, Francisco Javier Caminero Gil, C. Crespo-Casas, D. Merino
{"title":"The natural language processing module for a voice assisted operator at telef nica i+D","authors":"J. Alvarez-Cercadillo, Francisco Javier Caminero Gil, C. Crespo-Casas, D. Merino","doi":"10.21437/ICSLP.1996-265","DOIUrl":"https://doi.org/10.21437/ICSLP.1996-265","url":null,"abstract":"","PeriodicalId":90685,"journal":{"name":"Proceedings : ICSLP. International Conference on Spoken Language Processing","volume":"27 1","pages":"1161-1164"},"PeriodicalIF":0.0,"publicationDate":"1996-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74783686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}