S. Yamamoto, K. Takeda, N. Inoue, S. Kuroiwa, M. Naitoh
{"title":"A voice-activated telephone exchange system and its field trial","authors":"S. Yamamoto, K. Takeda, N. Inoue, S. Kuroiwa, M. Naitoh","doi":"10.1109/IVTTA.1994.341551","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341551","url":null,"abstract":"Speaker-independent speech recognition systems that can accept telephone quality speech may open opportunities for introducing new user-friendly services over the public switched telephone network (PSTN). The authors are currently engaged in a project to introduce an automatic speech recognizer over PSTN. They have developed a voice-activated telephone exchange system by combining a continuous speech recognizer and a private branch exchange system (PBX), and conducted field trials. The system has been installed in the R&D laboratories for daily use since June 1993, in order to investigate its performance in a real environment and collect man-machine dialogues. More than 5,000 man-machine dialogues have been collected, and incorrect recognitions have been analyzed and categorized into three categories such as (1) incorrect detection of speech, (2) out-of-vocabulary responses, (3) incorrect recognition with inadequate hidden Markov models of speech and noise. The authors have improved system performance by mainly attacking the issues (1) and (3). They have just developed a new version of the system, using the improved scheme obtained by analyzing the collected speech data. In order to collect more man-machine dialogues, they are planning to carry out the second phase field trial in which the new system will be installed in branch offices.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122525430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VoiceDialing-the first speech recognition based telephone service delivered to customer's home","authors":"G.J. Vysotsky","doi":"10.1109/IVTTA.1994.341523","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341523","url":null,"abstract":"The paper is an overview of NYNEX VoiceDialing service-a first introduction of speech recognition based technology to the mass market of residential and business customers. It is a network based service which allows telephone callers to make calls by simply saying the name of the person or place they wish to reach. VoiceDialing is compatible with both Touch Tone and rotary service and is designed to work on all existing telephone sets. The paper focuses on the network architecture, user interface, and speech recognition issues.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115680696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A system for field performance assessment of a speech recognition based telephone service","authors":"L.A. Zreik","doi":"10.1109/IVTTA.1994.341522","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341522","url":null,"abstract":"A data collection and analysis system has been designed, developed, and deployed as an integral part of VoiceDialing service, in order to measure the \"true\" recognition performance of the service in the field and other statistics, and to identify potential areas of improvement. The data collection system collects relevant training, recognition data and timing information, in addition to utterance recordings, on selected lines. Complementing the collection system, the data analysis system provides the capability to study, analyze and classify the collected data, to display and listen to collected utterances, generate statistics and enter analysis results in a database. A graphical user interface enhances the analysis system providing easy access to the database, and simplifying the analysis process. Methods used in measuring field recognition performance are presented, along with field results. A system, using field collected and analyzed data to test recognizers is proposed.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115793852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Telephone speech data corpus and performances of speaker independent recognition system using the corpus","authors":"T. Isobe, K. Murakami","doi":"10.1109/IVTTA.1994.341535","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341535","url":null,"abstract":"The authors first describe the speech data corpus they collected from 400 male and 400 female subjects over the phone. They then compare the performances of two types of triphone model based speaker independent recognition systems, in which they used the corpus for training models and testing. One system uses a normal continuous mixture density HMM, and the other uses a CDHMM with a tree structure of 2,064 Gaussian distributions, which needs only one thirtieth of the Gaussian computation of a normal one. As a result, the system with the tree-structure CDHMM performed as well as 3% less than the system using the normal CDHMM. This shows that tree-structure CDHMM are useful for telephone speech recognition.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124143892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Database query generation from spoken sentences","authors":"H. Aust, M. Oerder","doi":"10.1109/IVTTA.1994.341525","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341525","url":null,"abstract":"In the context of our spoken language inquiry system, we present the component which extracts the values needed for a database query from the textual representation of an utterance in the form of a word graph. A stochastic attributed grammar is used as a language model, to identify the relevant parts of the sentence, and to compute their meaning. High understanding rates, low computational costs and practically no restrictions of the usable language are important features of our system.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121610269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interactive speech and language systems for telecommunications applications at NYNEX","authors":"H. Leung, J. Spitz","doi":"10.1109/IVTTA.1994.341546","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341546","url":null,"abstract":"As information plays an ever-increasing role in our lives, users are demanding more in terms of their capability to retrieve and manipulate information. The paper is concerned with NYNEX's development of interactive speech and language systems for the provision of automated information services. While the long-term goal is to develop total system solutions to interact with users and assist them to search and retrieve information, progress is made in such a way that each component technology can be deployed by itself in various telecommunications applications. The authors discuss some of their findings, and draw from experience in technology development, lessons that have been learnt from service trials, and benefits that have been derived from others in the research community. The authors believe that advanced speech and language technologies can be quite acceptable to users, as long as a graceful and friendly human computer interface is also provided.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131481533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhanced voice services in the telecommunication network using the Texas Instruments multiserve","authors":"L. Netsch, R. Rajasekaran, B. Price","doi":"10.1109/IVTTA.1994.341540","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341540","url":null,"abstract":"The paper presents efforts that Texas Instruments is pursuing to place enhanced voice services in the telecommunications network. The authors describe the capabilities of the Texas Instruments multiserve platform, which is a system designed to implement enhanced telecommunication services. The paper discusses an example of some of the technology challenges involved in design of the system. The authors provide results of performance evaluation of the platform on important voice service tasks.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130451814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An experimental comparison of different feature extraction and classification methods for telephone speech","authors":"Tilo Schiirer","doi":"10.1109/IVTTA.1994.341537","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341537","url":null,"abstract":"Robust speech recognition over telephone lines severely depends on the choice of the feature extraction and classification methods. In order to get the highest possible performance of the speech recognizer a number of commonly used feature extraction methods (MFCC, LPC, PLP, RASTA-PLP) and classification methods (MLP, LVQ, HMM) were tested on the same telephone speech data. All combinations of feature extraction and classification methods were computed and several parameters of both methods where changed in order to find a non-local maximum of recognition accuracy. The paper does not describe a comparison of classification but of feature extraction methods because it is clear that an HMM would outperform both LVQ and MLP. The big question is if the same feature extraction methods always lead to the best results, no matter which classifier is used!.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131640213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dialog design for automatic speech recognition of telephone numbers and account numbers","authors":"D.J. Brens, B.L. Wattenbager","doi":"10.1109/IVTTA.1994.341531","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341531","url":null,"abstract":"The ultimate success of automatic speech recognition (ASR) depends not only performance characteristics of the technology but also on user behaviors. User behaviors are, in turn, affected by the prompts, reprompts, and user interface strategies that we use when designing a service. In one project, we have designed and tested modular elements of a user interface for automatic speech recognition (ASR). We describe a human factors study of connected digits transactions, including \"telephone number\" and \"account number\" transactions. In this study, candidate prompt/reprompt arrangements were tested using samples of the American consumer population. We consider some of the results from our study.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130996359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The role of voice processing in telecommunications","authors":"L. Rabiner","doi":"10.1109/IVTTA.1994.341554","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341554","url":null,"abstract":"During the decade of the 1990s, the fields of communications, computing, and networking are coming together in the form of personal information/communication terminals, and in the associated services (so-called personal communications services, PCS). Several technologies will play major roles in this communications revolution, but one of the key ones will be voice processing. The authors review several voice processing technologies, discuss current capabilities and the associated applications, and try to forecast where they see progress being achieved in the next decade and what applications will become commonplace as a result of the increased capabilities. They show how progress in voice processing is accompanied and stimulated by progress in microelectronics (memory and processing power of single chip architectures), and how, by the 21st century, telecommunications will have made major advances as a result of the use of voice processing.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"213 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132334819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}