{"title":"Impact of Network Performance on Cloud Speech Recognition","authors":"Mehdi Assefi, Mike P. Wittie, Allan Knight","doi":"10.1109/ICCCN.2015.7288417","DOIUrl":null,"url":null,"abstract":"Interactive real-time communication between people and machine enables innovations in transportation, health care, etc. Using voice or gesture commands improves usability and broad public appeal of such systems. In this paper we experimentally evaluate Google speech recognition and Apple Siri - two of the most popular cloud-based speech recognition systems. Our goal is to evaluate the performance of these systems under different network conditions in terms of command recognition accuracy and round trip delay - two metrics that affect interactive application usability. Our results show that speech recognition systems are affected by loss and jitter, commonly present in cellular and WiFi networks. Finally, we propose and evaluate a network coding transport solution to improve the quality of voice transmission to cloud-based speech recognition systems. Experiments show that our approach improves the accuracy and delay of cloud speech recognizers under different loss and jitter values.","PeriodicalId":117136,"journal":{"name":"2015 24th International Conference on Computer Communication and Networks (ICCCN)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 24th International Conference on Computer Communication and Networks (ICCCN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCN.2015.7288417","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25
Abstract
Interactive real-time communication between people and machine enables innovations in transportation, health care, etc. Using voice or gesture commands improves usability and broad public appeal of such systems. In this paper we experimentally evaluate Google speech recognition and Apple Siri - two of the most popular cloud-based speech recognition systems. Our goal is to evaluate the performance of these systems under different network conditions in terms of command recognition accuracy and round trip delay - two metrics that affect interactive application usability. Our results show that speech recognition systems are affected by loss and jitter, commonly present in cellular and WiFi networks. Finally, we propose and evaluate a network coding transport solution to improve the quality of voice transmission to cloud-based speech recognition systems. Experiments show that our approach improves the accuracy and delay of cloud speech recognizers under different loss and jitter values.