{"title":"VoiceTalk: A No-Code Approach for Creating Voice-Controlled Smart Home Applications","authors":"Yun-Wei Lin;Yi-Bing Lin;Yi-Feng Wu;Pei-Hsuan Shen","doi":"10.1109/OJCS.2025.3576725","DOIUrl":null,"url":null,"abstract":"This article introduces VoiceTalk, a no-code approach that develops voice-controlled smart home applications without requiring programming expertise. At its core, VoiceTalk utilizes IoTtalk, an IoT application development platform for managing a diverse range of IoT devices. IoTtalk employs a two-tier microservices architecture, enabling users to define and chain applications through an intuitive drag-and-drop line interface. Leveraging its microservice architecture, VoiceTalk integrates IoTtalk with Google Home, offering a no-code solution for voice-controlled applications. VoiceTalk leverages its understanding of smart appliances in the room/house to generate specific prompts. We have compared the translation accuracy of 7 Automatic Speech Recognition (ASR) systems. We make two contributions. First, the no-code VoiceTalk platform significantly simplifies the development of Google Home-like applications. Second, by integrating ASRs with a commercial LLM such as GPT, we dramatically reduce voice-to-text translation errors, for examples, from 5.13% to 0.54% for the Web Speech API and from 2.25% to zero for Whisper Medium. For small-sized open-source LLMs such as Llama 3.2 3B, the errors are reduced to 0.72% for the Web Speech API and to zero for Whisper Medium. Furthermore, Device LLM Agent of VoiceTalk can be easily extended to integrate IoTtalk with other voice platforms, such as AWS Alexa.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"6 ","pages":"874-883"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11023638","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of the Computer Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11023638/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This article introduces VoiceTalk, a no-code approach that develops voice-controlled smart home applications without requiring programming expertise. At its core, VoiceTalk utilizes IoTtalk, an IoT application development platform for managing a diverse range of IoT devices. IoTtalk employs a two-tier microservices architecture, enabling users to define and chain applications through an intuitive drag-and-drop line interface. Leveraging its microservice architecture, VoiceTalk integrates IoTtalk with Google Home, offering a no-code solution for voice-controlled applications. VoiceTalk leverages its understanding of smart appliances in the room/house to generate specific prompts. We have compared the translation accuracy of 7 Automatic Speech Recognition (ASR) systems. We make two contributions. First, the no-code VoiceTalk platform significantly simplifies the development of Google Home-like applications. Second, by integrating ASRs with a commercial LLM such as GPT, we dramatically reduce voice-to-text translation errors, for examples, from 5.13% to 0.54% for the Web Speech API and from 2.25% to zero for Whisper Medium. For small-sized open-source LLMs such as Llama 3.2 3B, the errors are reduced to 0.72% for the Web Speech API and to zero for Whisper Medium. Furthermore, Device LLM Agent of VoiceTalk can be easily extended to integrate IoTtalk with other voice platforms, such as AWS Alexa.