VoiceTalk: A No-Code Approach for Creating Voice-Controlled Smart Home Applications

Yun-Wei Lin;Yi-Bing Lin;Yi-Feng Wu;Pei-Hsuan Shen
{"title":"VoiceTalk: A No-Code Approach for Creating Voice-Controlled Smart Home Applications","authors":"Yun-Wei Lin;Yi-Bing Lin;Yi-Feng Wu;Pei-Hsuan Shen","doi":"10.1109/OJCS.2025.3576725","DOIUrl":null,"url":null,"abstract":"This article introduces VoiceTalk, a no-code approach that develops voice-controlled smart home applications without requiring programming expertise. At its core, VoiceTalk utilizes IoTtalk, an IoT application development platform for managing a diverse range of IoT devices. IoTtalk employs a two-tier microservices architecture, enabling users to define and chain applications through an intuitive drag-and-drop line interface. Leveraging its microservice architecture, VoiceTalk integrates IoTtalk with Google Home, offering a no-code solution for voice-controlled applications. VoiceTalk leverages its understanding of smart appliances in the room/house to generate specific prompts. We have compared the translation accuracy of 7 Automatic Speech Recognition (ASR) systems. We make two contributions. First, the no-code VoiceTalk platform significantly simplifies the development of Google Home-like applications. Second, by integrating ASRs with a commercial LLM such as GPT, we dramatically reduce voice-to-text translation errors, for examples, from 5.13% to 0.54% for the Web Speech API and from 2.25% to zero for Whisper Medium. For small-sized open-source LLMs such as Llama 3.2 3B, the errors are reduced to 0.72% for the Web Speech API and to zero for Whisper Medium. Furthermore, Device LLM Agent of VoiceTalk can be easily extended to integrate IoTtalk with other voice platforms, such as AWS Alexa.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"6 ","pages":"874-883"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11023638","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of the Computer Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11023638/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This article introduces VoiceTalk, a no-code approach that develops voice-controlled smart home applications without requiring programming expertise. At its core, VoiceTalk utilizes IoTtalk, an IoT application development platform for managing a diverse range of IoT devices. IoTtalk employs a two-tier microservices architecture, enabling users to define and chain applications through an intuitive drag-and-drop line interface. Leveraging its microservice architecture, VoiceTalk integrates IoTtalk with Google Home, offering a no-code solution for voice-controlled applications. VoiceTalk leverages its understanding of smart appliances in the room/house to generate specific prompts. We have compared the translation accuracy of 7 Automatic Speech Recognition (ASR) systems. We make two contributions. First, the no-code VoiceTalk platform significantly simplifies the development of Google Home-like applications. Second, by integrating ASRs with a commercial LLM such as GPT, we dramatically reduce voice-to-text translation errors, for examples, from 5.13% to 0.54% for the Web Speech API and from 2.25% to zero for Whisper Medium. For small-sized open-source LLMs such as Llama 3.2 3B, the errors are reduced to 0.72% for the Web Speech API and to zero for Whisper Medium. Furthermore, Device LLM Agent of VoiceTalk can be easily extended to integrate IoTtalk with other voice platforms, such as AWS Alexa.
VoiceTalk:创建语音控制智能家居应用程序的无代码方法
本文介绍了VoiceTalk,这是一种无需编程专业知识即可开发语音控制智能家居应用程序的无代码方法。VoiceTalk的核心是利用物联网应用开发平台IoTtalk来管理各种物联网设备。IoTtalk采用两层微服务架构,使用户能够通过直观的拖放界面定义和链接应用程序。利用其微服务架构,VoiceTalk将IoTtalk与谷歌Home集成在一起,为语音控制应用程序提供无代码解决方案。VoiceTalk利用其对房间/房屋中的智能设备的理解来生成特定的提示。我们比较了7种自动语音识别系统的翻译精度。我们有两个贡献。首先,无代码的VoiceTalk平台大大简化了谷歌家庭应用程序的开发。其次,通过将asr与商业法学硕士(如GPT)集成,我们大大减少了语音到文本的翻译错误,例如,Web Speech API从5.13%降至0.54%,Whisper Medium从2.25%降至零。对于小型开源llm,如Llama 3.2 3B, Web Speech API的错误减少到0.72%,Whisper Medium的错误减少到零。此外,VoiceTalk的Device LLM Agent可以很容易地扩展,以集成IoTtalk与其他语音平台,如AWS Alexa。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
12.60
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信