Quality evaluation of digital voice assistants for diabetes management

IF 0.7 Q4 MEDICINE, RESEARCH & EXPERIMENTAL

AIMS Medical Science Pub Date : 2023-01-01 DOI:10.3934/medsci.2023008

Joy Qi En Chia, L. Wong, K. Yap

{"title":"Quality evaluation of digital voice assistants for diabetes management","authors":"Joy Qi En Chia, L. Wong, K. Yap","doi":"10.3934/medsci.2023008","DOIUrl":null,"url":null,"abstract":"Background Digital voice assistants (DVAs) are increasingly used to search for health information. However, the quality of information provided by DVAs is not consistent across health conditions. From our knowledge, there have been no studies that evaluated the quality of DVAs in response to diabetes-related queries. The objective of this study was to evaluate the quality of DVAs in relation to queries on diabetes management. Materials and methods Seventy-four questions were posed to smartphone (Apple Siri, Google Assistant, Samsung Bixby) and non-smartphone DVAs (Amazon Alexa, Sulli the Diabetes Guru, Google Nest Mini, Microsoft Cortana), and their responses were compared to that of Internet Google Search. Questions were categorized under diagnosis, screening, management, treatment and complications of diabetes, and the impacts of COVID-19 on diabetes. The DVAs were evaluated on their technical ability, user-friendliness, reliability, comprehensiveness and accuracy of their responses. Data was analyzed using the Kruskal-Wallis and Wilcoxon rank-sum tests. Intraclass correlation coefficient was used to report inter-rater reliability. Results Google Assistant (n = 69/74, 93.2%), Siri and Nest Mini (n = 64/74, 86.5% each) had the highest proportions of successful and relevant responses, in contrast to Cortana (n = 23/74, 31.1%) and Sulli (n = 10/74, 13.5%), which had the lowest successful and relevant responses. Median total scores of the smartphone DVAs (Bixby 75.3%, Google Assistant 73.3%, Siri 72.0%) were comparable to that of Google Search (70.0%, p = 0.034), while median total scores of non-smartphone DVAs (Nest Mini 56.9%, Alexa 52.9%, Cortana 52.5% and Sulli the Diabetes Guru 48.6%) were significantly lower (p < 0.001). Non-smartphone DVAs had much lower median comprehensiveness (16.7% versus 100.0%, p < 0.001) and reliability scores (30.8% versus 61.5%, p < 0.001) compared to Google Search. Conclusions Google Assistant, Siri and Bixby were the best-performing DVAs for answering diabetes-related queries. However, the lack of successful and relevant responses by Bixby may frustrate users, especially if they have COVID-19 related queries. All DVAs scored highly for user-friendliness, but can be improved in terms of accuracy, comprehensiveness and reliability. DVA designers are encouraged to consider features related to accuracy, comprehensiveness, reliability and user-friendliness when developing their products, so as to enhance the quality of DVAs for medical purposes, such as diabetes management.","PeriodicalId":43011,"journal":{"name":"AIMS Medical Science","volume":"1 1","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AIMS Medical Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3934/medsci.2023008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

Background Digital voice assistants (DVAs) are increasingly used to search for health information. However, the quality of information provided by DVAs is not consistent across health conditions. From our knowledge, there have been no studies that evaluated the quality of DVAs in response to diabetes-related queries. The objective of this study was to evaluate the quality of DVAs in relation to queries on diabetes management. Materials and methods Seventy-four questions were posed to smartphone (Apple Siri, Google Assistant, Samsung Bixby) and non-smartphone DVAs (Amazon Alexa, Sulli the Diabetes Guru, Google Nest Mini, Microsoft Cortana), and their responses were compared to that of Internet Google Search. Questions were categorized under diagnosis, screening, management, treatment and complications of diabetes, and the impacts of COVID-19 on diabetes. The DVAs were evaluated on their technical ability, user-friendliness, reliability, comprehensiveness and accuracy of their responses. Data was analyzed using the Kruskal-Wallis and Wilcoxon rank-sum tests. Intraclass correlation coefficient was used to report inter-rater reliability. Results Google Assistant (n = 69/74, 93.2%), Siri and Nest Mini (n = 64/74, 86.5% each) had the highest proportions of successful and relevant responses, in contrast to Cortana (n = 23/74, 31.1%) and Sulli (n = 10/74, 13.5%), which had the lowest successful and relevant responses. Median total scores of the smartphone DVAs (Bixby 75.3%, Google Assistant 73.3%, Siri 72.0%) were comparable to that of Google Search (70.0%, p = 0.034), while median total scores of non-smartphone DVAs (Nest Mini 56.9%, Alexa 52.9%, Cortana 52.5% and Sulli the Diabetes Guru 48.6%) were significantly lower (p < 0.001). Non-smartphone DVAs had much lower median comprehensiveness (16.7% versus 100.0%, p < 0.001) and reliability scores (30.8% versus 61.5%, p < 0.001) compared to Google Search. Conclusions Google Assistant, Siri and Bixby were the best-performing DVAs for answering diabetes-related queries. However, the lack of successful and relevant responses by Bixby may frustrate users, especially if they have COVID-19 related queries. All DVAs scored highly for user-friendliness, but can be improved in terms of accuracy, comprehensiveness and reliability. DVA designers are encouraged to consider features related to accuracy, comprehensiveness, reliability and user-friendliness when developing their products, so as to enhance the quality of DVAs for medical purposes, such as diabetes management.

查看原文本刊更多论文

糖尿病管理数字语音助手的质量评价

数字语音助手(DVAs)越来越多地用于搜索健康信息。然而，dva提供的信息质量在各种健康状况下并不一致。据我们所知，目前还没有研究评估DVAs对糖尿病相关问题的响应质量。本研究的目的是评估与糖尿病管理查询相关的DVAs质量。材料与方法向智能手机(苹果Siri、谷歌助手、三星Bixby)和非智能手机DVAs(亚马逊Alexa、Sulli the Diabetes Guru、谷歌Nest Mini、微软Cortana)提出74个问题，并将其回答与互联网谷歌搜索进行比较。问题被分类为糖尿病的诊断、筛查、管理、治疗和并发症，以及COVID-19对糖尿病的影响。评估了dva的技术能力、用户友好性、可靠性、全面性和准确性。数据分析采用Kruskal-Wallis和Wilcoxon秩和检验。分类内相关系数用于报告分类间信度。结果google Assistant (n = 69/74, 93.2%)、Siri和Nest Mini (n = 64/74，各占86.5%)的成功和相关回答比例最高，Cortana (n = 23/74, 31.1%)和Sulli (n = 10/74, 13.5%)的成功和相关回答比例最低。智能手机DVAs的总得分中位数(Bixby为75.3%，Google Assistant为73.3%，Siri为72.0%)与Google Search的总得分中位数相当(70.0%，p = 0.034)，而非智能手机DVAs的总得分中位数(Nest Mini为56.9%，Alexa为52.9%，Cortana为52.5%，Sulli the Diabetes Guru为48.6%)显著低于前者(p < 0.001)。与谷歌搜索相比，非智能手机dva的中位数全能性(16.7%对100.0%，p < 0.001)和可靠性得分(30.8%对61.5%，p < 0.001)要低得多。结论在回答糖尿病相关问题时，google Assistant、Siri和Bixby是表现最好的DVAs。然而，Bixby缺乏成功和相关的响应可能会让用户感到沮丧，特别是当他们有与COVID-19相关的查询时。所有dva在用户友好性方面得分都很高，但在准确性、全面性和可靠性方面可以改进。鼓励DVA设计人员在开发产品时考虑与准确性、全面性、可靠性和用户友好性相关的特性，以提高用于医疗目的的DVA的质量，例如糖尿病管理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

AIMS Medical Science MEDICINE, RESEARCH & EXPERIMENTAL-

自引率

14.30%

发文量

审稿时长

12 weeks