R. Nisimura, Jumpei Miyake, Hideki Kawahara, T. Irino
{"title":"Speech-to-text input method for web system using JavaScript","authors":"R. Nisimura, Jumpei Miyake, Hideki Kawahara, T. Irino","doi":"10.1109/SLT.2008.4777877","DOIUrl":null,"url":null,"abstract":"We have developed a speech-to-text input method for web systems. The system is provided as a JavaScript library including an Ajax-like mechanism based on a Java applet, CGI programs, and dynamic HTML documents. It allows users to access voice-enabled web pages without requiring special browsers. Web developers can embed it on their web page by inserting only one line in the header field of an HTML document. This study also aims at observing natural spoken interactions in personal environments. We have succeeded in collecting 4,003 inputs during a period of seven months via our public Japanese ASR server. In order to cover out-of-vocabulary words to cope with some proper nouns, a web page to register new words into the language model are developed. As a result, we could obtain an improvement of 0.8% in the recognition accuracy. With regard to the acoustical conditions, an SNR of 25.3 dB was observed.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE Spoken Language Technology Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2008.4777877","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
We have developed a speech-to-text input method for web systems. The system is provided as a JavaScript library including an Ajax-like mechanism based on a Java applet, CGI programs, and dynamic HTML documents. It allows users to access voice-enabled web pages without requiring special browsers. Web developers can embed it on their web page by inserting only one line in the header field of an HTML document. This study also aims at observing natural spoken interactions in personal environments. We have succeeded in collecting 4,003 inputs during a period of seven months via our public Japanese ASR server. In order to cover out-of-vocabulary words to cope with some proper nouns, a web page to register new words into the language model are developed. As a result, we could obtain an improvement of 0.8% in the recognition accuracy. With regard to the acoustical conditions, an SNR of 25.3 dB was observed.