{"title":"物联网语音识别和语音分离","authors":"M. Hasanzadeh-Mofrad, D. Mossé","doi":"10.1145/3277593.3277610","DOIUrl":null,"url":null,"abstract":"Internet-connected home automation devices like Amazon Echo, Nest Thermostat and smart TVs are riding the wave of the Internet of Things (IoT). The most intuitive, natural and efficient way to communicate with these devices is through talking with them (via voice commands). Speech recognition provides the medium to process and convert spoken language to plain text for further use of the system. Current personal assistants like Amazon Echo and Google Home have been designed for home automation, impacting billions of people's lives; yet, they are not customizable. Another limitation of these devices is that they cannot recognize concurrent voice commands issued by two persons at the same time. We tackle this last challenge, given that often there is more than one person in each room. The contributions of this paper are twofold: (1) we build an inexpensive and customizable voice-enabled IoT prototype using Raspberry Pi motherboard and Google Cloud speech-to-text API and (2) we propose a method to solve the problem of voice separation in the context of IoT when two people are trying to send commands to a voice-enabled IoT device and both commands are required to be processed and executed simultaneously. Using our proposed voice separation approach, our results showed that we improved the accuracy of transcription process up to 3% compared to the baseline model which does not support for voice separation.","PeriodicalId":129822,"journal":{"name":"Proceedings of the 8th International Conference on the Internet of Things","volume":"136 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Speech recognition and voice separation for the internet of things\",\"authors\":\"M. Hasanzadeh-Mofrad, D. Mossé\",\"doi\":\"10.1145/3277593.3277610\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Internet-connected home automation devices like Amazon Echo, Nest Thermostat and smart TVs are riding the wave of the Internet of Things (IoT). The most intuitive, natural and efficient way to communicate with these devices is through talking with them (via voice commands). Speech recognition provides the medium to process and convert spoken language to plain text for further use of the system. Current personal assistants like Amazon Echo and Google Home have been designed for home automation, impacting billions of people's lives; yet, they are not customizable. Another limitation of these devices is that they cannot recognize concurrent voice commands issued by two persons at the same time. We tackle this last challenge, given that often there is more than one person in each room. The contributions of this paper are twofold: (1) we build an inexpensive and customizable voice-enabled IoT prototype using Raspberry Pi motherboard and Google Cloud speech-to-text API and (2) we propose a method to solve the problem of voice separation in the context of IoT when two people are trying to send commands to a voice-enabled IoT device and both commands are required to be processed and executed simultaneously. Using our proposed voice separation approach, our results showed that we improved the accuracy of transcription process up to 3% compared to the baseline model which does not support for voice separation.\",\"PeriodicalId\":129822,\"journal\":{\"name\":\"Proceedings of the 8th International Conference on the Internet of Things\",\"volume\":\"136 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 8th International Conference on the Internet of Things\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3277593.3277610\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th International Conference on the Internet of Things","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3277593.3277610","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Speech recognition and voice separation for the internet of things
Internet-connected home automation devices like Amazon Echo, Nest Thermostat and smart TVs are riding the wave of the Internet of Things (IoT). The most intuitive, natural and efficient way to communicate with these devices is through talking with them (via voice commands). Speech recognition provides the medium to process and convert spoken language to plain text for further use of the system. Current personal assistants like Amazon Echo and Google Home have been designed for home automation, impacting billions of people's lives; yet, they are not customizable. Another limitation of these devices is that they cannot recognize concurrent voice commands issued by two persons at the same time. We tackle this last challenge, given that often there is more than one person in each room. The contributions of this paper are twofold: (1) we build an inexpensive and customizable voice-enabled IoT prototype using Raspberry Pi motherboard and Google Cloud speech-to-text API and (2) we propose a method to solve the problem of voice separation in the context of IoT when two people are trying to send commands to a voice-enabled IoT device and both commands are required to be processed and executed simultaneously. Using our proposed voice separation approach, our results showed that we improved the accuracy of transcription process up to 3% compared to the baseline model which does not support for voice separation.