{"title":"结合趋势数据的基于云的语音识别引擎的动态改进","authors":"Milind Bhavsar, Prudhvi Kosaraju, G. Ananthakrishnan, Gurudas Subray Shet, Saurav Anand","doi":"10.1109/MobileCloud.2016.12","DOIUrl":null,"url":null,"abstract":"With the advancement of speech recognition technologies, there is an increase in the adoption of voice interfaces on mobile-based platforms. While, developing a general purpose Automatic Speech Recognition (ASR) which can understand voice commands is important, the contexts of how people interact with their mobile device change very rapidly. Due to the high processing complexity of the ASR engine, much of the processing of trending data is being carried out on cloudplatforms. Changed content regarding news, music, movies and TV series change the focus of interaction with voice based interfaces. Hence ASR engines trained on a static vocabulary may not be able to adapt to the changing contexts. The focus of this paper is to first describe the problems faced in incorporating dynamically changing vocabulary and contexts into an ASR engine. We then propose a novel solution which shows a relative improvement of 38 percent utterance accuracy on newly added content without compromising on the overall accuracy and stability of the system.","PeriodicalId":176270,"journal":{"name":"2016 4th IEEE International Conference on Mobile Cloud Computing, Services, and Engineering (MobileCloud)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Dynamic Improvements in a Cloud-Based Speech Recognition Engine by Incorporating Trending Data\",\"authors\":\"Milind Bhavsar, Prudhvi Kosaraju, G. Ananthakrishnan, Gurudas Subray Shet, Saurav Anand\",\"doi\":\"10.1109/MobileCloud.2016.12\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the advancement of speech recognition technologies, there is an increase in the adoption of voice interfaces on mobile-based platforms. While, developing a general purpose Automatic Speech Recognition (ASR) which can understand voice commands is important, the contexts of how people interact with their mobile device change very rapidly. Due to the high processing complexity of the ASR engine, much of the processing of trending data is being carried out on cloudplatforms. Changed content regarding news, music, movies and TV series change the focus of interaction with voice based interfaces. Hence ASR engines trained on a static vocabulary may not be able to adapt to the changing contexts. The focus of this paper is to first describe the problems faced in incorporating dynamically changing vocabulary and contexts into an ASR engine. We then propose a novel solution which shows a relative improvement of 38 percent utterance accuracy on newly added content without compromising on the overall accuracy and stability of the system.\",\"PeriodicalId\":176270,\"journal\":{\"name\":\"2016 4th IEEE International Conference on Mobile Cloud Computing, Services, and Engineering (MobileCloud)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 4th IEEE International Conference on Mobile Cloud Computing, Services, and Engineering (MobileCloud)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MobileCloud.2016.12\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 4th IEEE International Conference on Mobile Cloud Computing, Services, and Engineering (MobileCloud)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MobileCloud.2016.12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Dynamic Improvements in a Cloud-Based Speech Recognition Engine by Incorporating Trending Data
With the advancement of speech recognition technologies, there is an increase in the adoption of voice interfaces on mobile-based platforms. While, developing a general purpose Automatic Speech Recognition (ASR) which can understand voice commands is important, the contexts of how people interact with their mobile device change very rapidly. Due to the high processing complexity of the ASR engine, much of the processing of trending data is being carried out on cloudplatforms. Changed content regarding news, music, movies and TV series change the focus of interaction with voice based interfaces. Hence ASR engines trained on a static vocabulary may not be able to adapt to the changing contexts. The focus of this paper is to first describe the problems faced in incorporating dynamically changing vocabulary and contexts into an ASR engine. We then propose a novel solution which shows a relative improvement of 38 percent utterance accuracy on newly added content without compromising on the overall accuracy and stability of the system.