{"title":"Fast telephone channel adaptation based on vector field smoothing technique","authors":"J. Takahashi, S. Sagayama","doi":"10.1109/IVTTA.1994.341536","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341536","url":null,"abstract":"The paper presents a fast telephone channel adaptation method of MAP/VFS with a sequential training function. The concept is based on using maximum a posteriori (MAP) estimation as an intra-class training scheme in combination with vector field smoothing (VFS) technique as an inter-class training scheme. Experimental results of simultaneous adaptation to a telephone channel and a speaker show the proposed method is significantly superior to sequential MAP adaptation. The error reduction rate achieved in sequentially adapting a few words of sample data is about 41% using the proposed method, while that of the sequential MAP adaptation hardly improved even with ten-word adaptation data. MAP/VFS, with its fast and sequential adaptation function, is expected to be very useful in developing telephone applications such as information services proceeded by iterative tree-structured item selection.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125334783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Field trial of a speaker verification service for caller identity verification in the telephone network","authors":"J. Naik","doi":"10.1109/IVTTA.1994.341529","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341529","url":null,"abstract":"A field trial of a network-integrated speaker verification system was performed in the NYNEX public switched telephone network in 1993-94. Speaker verification was performed on all calling-card calls placed by NYNEX customers who took part in this trial. Subsequently, a comprehensive impostor field-trial was performed. A variety of phones, channel conditions and caller/calling environments were represented in this large field-trial. The results show that this system performed very well under these real-world conditions. A valid user rejection rate of 1%, which is operationally very desirable, produced an equally low dedicated impostor acceptance of 3.9%. User surveys showed high user preference of this type of service. The paper discusses the results of the field trial in detail.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115036244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A voice transaction processing application with PSOLA based text to speech conversion for Spanish","authors":"I. Hernáez, A. Cuesta","doi":"10.1109/IVTTA.1994.341534","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341534","url":null,"abstract":"Presents a new synthesis scheme for voice transaction processing application. The system makes use of concatenation of previously recorded messages with synthetic speech segments generated by a text to speech converter. Text to speech conversion is made pitch synchronously overlapping and adding diphones and triphone speech units, and is used only for unpredictable vocabulary e.g. names, addresses, account numbers, etc.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"1992 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128610535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dialog design for a speech-interactive automation system","authors":"B. L. Zeigler, B. Bazor","doi":"10.1109/IVTTA.1994.341532","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341532","url":null,"abstract":"We discuss our approach to dialog design for telephone service orders, and describe the dialog developed for our service disconnect system. Our approach is based on the characterization of applications in terms of information elements and their attributes. We build the acquisition dialog for each information element by customizing generic dialog prototypes to match its type and attributes. The design of the dialog prototypes is based on dyads of system outcomes and recourse actions. Our approach features design modularity, relative ease of scaling dialogs to new applications, and decoupling the dialog design from the specifics of system and recognition technologies.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128168930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A field study of performance improvements in HMM-based speaker verification","authors":"T. Jacobs, A. Setlur","doi":"10.1109/IVTTA.1994.341530","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341530","url":null,"abstract":"This study reports our findings on speaker verification (SV) performance improvements using random 4-digit utterances collected over a single microphone type. The databases used in this study are the result of an ongoing field trial of SV access to automatic teller machines (ATMs) for secure unattended banking services. The SV system uses continuous density HMM models trained on 18 connected 4-digit utterances and has a baseline equal-error-rate (EER) of between 5.5 and 11% for different sets of data. Because of the limited training data, estimates for the mixture variances are most often poor. By calculating average mixture variances using all of the training data for a given speaker and then setting all of the model variances for that speaker to these speaker dependent values and using cohort normalization, the EER decreases consistently to between 2.5 and 6.5%.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117327978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Sorin, D. Jouvet, C. Gagnoulet, D. Dubois, D. Sadek, M. Toularhoat
{"title":"Operational and experimental French telecommunication services using CNET speech recognition and text-to-speech synthesis","authors":"C. Sorin, D. Jouvet, C. Gagnoulet, D. Dubois, D. Sadek, M. Toularhoat","doi":"10.1109/IVTTA.1994.341550","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341550","url":null,"abstract":"The paper presents a brief overview of current uses for CNET speech technology (speech recognition and text-to-speech systems) in interactive voice response services (IVR). Several services are described, and the latest evaluation of one ASR-based service is also outlined. Finally, the paper summarizes developments in the CNET ASR and TTS technology.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"164 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131000385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Noise suppression in cellular communications","authors":"H. Hermansky, E. Wan, C. Avendaño","doi":"10.1109/IVTTA.1994.341539","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341539","url":null,"abstract":"FIR Wiener-like filters are applied to time trajectories of cubic-root compressed short-term power spectrum of noisy speech recorded over cellular communications. Informal listenings indicate that the technique brings a noticeable improvement in quality of noisy speech in the overlap-add analysis-synthesis system while not causing any significant degradation on clean speech.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131309006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A multimodal consumer information server with IVR menu","authors":"M. Damhuis, M. Peeters, L. Boves","doi":"10.1109/IVTTA.1994.341542","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341542","url":null,"abstract":"The paper describes the development of a fully automatic multimodal information system for the consumer market. The system will be able to provide information on a large number of topics via a single telephone number. The eventual system will integrate interactive voice response, speech recognition, speaker verification, direct dial in, calling line identification, facsimile and electronic mail. The present version is limited to DTMF input and voice and facsimile output. The architecture of the system described in the paper allows successive addition of other technologies.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127962568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Bayya, M. Ďurian, L. Meiskey, R. Root, R. Sparks
{"title":"Assessment of the VoiceMap spoken language system","authors":"A. Bayya, M. Ďurian, L. Meiskey, R. Root, R. Sparks","doi":"10.1109/IVTTA.1994.341528","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341528","url":null,"abstract":"As spoken language systems expand to new tasks, there will be a need for empirical research on how to optimize the usability of human-computer spoken natural language dialogues, including research on methods for chunking information. For allowing users to control provision of that information, and for providing feedback on the system's processing and current context. The paper describes the results of usability study performed to evaluate the performance and usability as well as acceptability of a system that provides street map directions.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"194 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124287027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Experience with the Philips automatic train timetable information system","authors":"H. Aust, M. Oerder, F. Seide, V. Steinbiss","doi":"10.1109/IVTTA.1994.341543","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341543","url":null,"abstract":"Introduces an automatic system for train timetable information over the telephone that provides accurate connections between 1200 German cities. The caller can talk to it in unrestricted, natural, and fluent speech, very much like he or she would communicate with a human operator, and is not given any instructions in advance. In an ongoing field trial, this system has been made available to the general public, both to gather speech data and to evaluate its performance. This field test was organized as a bootstrapping process: initially, the system was trained with just the developers' voices, then the telephone number was passed around within the department, the company, and finally, the outside world. After each step, the newly collected material was used for retraining and general improvements. The observations and results from this test are reported.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"55 10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124312984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}