VoiceTalk: A No-Code Approach for Creating Voice-Controlled Smart Home Applications

IEEE Open Journal of the Computer Society Pub Date : 2025-06-04 DOI:10.1109/OJCS.2025.3576725

Yun-Wei Lin;Yi-Bing Lin;Yi-Feng Wu;Pei-Hsuan Shen

{"title":"VoiceTalk: A No-Code Approach for Creating Voice-Controlled Smart Home Applications","authors":"Yun-Wei Lin;Yi-Bing Lin;Yi-Feng Wu;Pei-Hsuan Shen","doi":"10.1109/OJCS.2025.3576725","DOIUrl":null,"url":null,"abstract":"This article introduces VoiceTalk, a no-code approach that develops voice-controlled smart home applications without requiring programming expertise. At its core, VoiceTalk utilizes IoTtalk, an IoT application development platform for managing a diverse range of IoT devices. IoTtalk employs a two-tier microservices architecture, enabling users to define and chain applications through an intuitive drag-and-drop line interface. Leveraging its microservice architecture, VoiceTalk integrates IoTtalk with Google Home, offering a no-code solution for voice-controlled applications. VoiceTalk leverages its understanding of smart appliances in the room/house to generate specific prompts. We have compared the translation accuracy of 7 Automatic Speech Recognition (ASR) systems. We make two contributions. First, the no-code VoiceTalk platform significantly simplifies the development of Google Home-like applications. Second, by integrating ASRs with a commercial LLM such as GPT, we dramatically reduce voice-to-text translation errors, for examples, from 5.13% to 0.54% for the Web Speech API and from 2.25% to zero for Whisper Medium. For small-sized open-source LLMs such as Llama 3.2 3B, the errors are reduced to 0.72% for the Web Speech API and to zero for Whisper Medium. Furthermore, Device LLM Agent of VoiceTalk can be easily extended to integrate IoTtalk with other voice platforms, such as AWS Alexa.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"6 ","pages":"874-883"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11023638","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of the Computer Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11023638/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This article introduces VoiceTalk, a no-code approach that develops voice-controlled smart home applications without requiring programming expertise. At its core, VoiceTalk utilizes IoTtalk, an IoT application development platform for managing a diverse range of IoT devices. IoTtalk employs a two-tier microservices architecture, enabling users to define and chain applications through an intuitive drag-and-drop line interface. Leveraging its microservice architecture, VoiceTalk integrates IoTtalk with Google Home, offering a no-code solution for voice-controlled applications. VoiceTalk leverages its understanding of smart appliances in the room/house to generate specific prompts. We have compared the translation accuracy of 7 Automatic Speech Recognition (ASR) systems. We make two contributions. First, the no-code VoiceTalk platform significantly simplifies the development of Google Home-like applications. Second, by integrating ASRs with a commercial LLM such as GPT, we dramatically reduce voice-to-text translation errors, for examples, from 5.13% to 0.54% for the Web Speech API and from 2.25% to zero for Whisper Medium. For small-sized open-source LLMs such as Llama 3.2 3B, the errors are reduced to 0.72% for the Web Speech API and to zero for Whisper Medium. Furthermore, Device LLM Agent of VoiceTalk can be easily extended to integrate IoTtalk with other voice platforms, such as AWS Alexa.

查看原文本刊更多论文

VoiceTalk：创建语音控制智能家居应用程序的无代码方法

本文介绍了VoiceTalk，这是一种无需编程专业知识即可开发语音控制智能家居应用程序的无代码方法。VoiceTalk的核心是利用物联网应用开发平台IoTtalk来管理各种物联网设备。IoTtalk采用两层微服务架构，使用户能够通过直观的拖放界面定义和链接应用程序。利用其微服务架构，VoiceTalk将IoTtalk与谷歌Home集成在一起，为语音控制应用程序提供无代码解决方案。VoiceTalk利用其对房间/房屋中的智能设备的理解来生成特定的提示。我们比较了7种自动语音识别系统的翻译精度。我们有两个贡献。首先，无代码的VoiceTalk平台大大简化了谷歌家庭应用程序的开发。其次，通过将asr与商业法学硕士（如GPT）集成，我们大大减少了语音到文本的翻译错误，例如，Web Speech API从5.13%降至0.54%，Whisper Medium从2.25%降至零。对于小型开源llm，如Llama 3.2 3B， Web Speech API的错误减少到0.72%，Whisper Medium的错误减少到零。此外，VoiceTalk的Device LLM Agent可以很容易地扩展，以集成IoTtalk与其他语音平台，如AWS Alexa。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Open Journal of the Computer Society

CiteScore

12.60

自引率

0.00%

发文量