Marvin Kopka, Niklas von Kalckreuth, Markus A. Feufel
{"title":"Accuracy of online symptom assessment applications, large language models, and laypeople for self–triage decisions","authors":"Marvin Kopka, Niklas von Kalckreuth, Markus A. Feufel","doi":"10.1038/s41746-025-01566-6","DOIUrl":null,"url":null,"abstract":"<p>Symptom-Assessment Application (SAAs, e.g., NHS 111 online) that assist laypeople in deciding if and where to seek care (<i>self-triage</i>) are gaining popularity and Large Language Models (LLMs) are increasingly used too. However, there is no evidence synthesis on the accuracy of LLMs, and no review has contextualized the accuracy of SAAs and LLMs. This systematic review evaluates the self-triage accuracy of both SAAs and LLMs and compares them to the accuracy of laypeople. A total of 1549 studies were screened and 19 included. The self-triage accuracy of SAAs was moderate but highly variable (11.5–90.0%), while the accuracy of LLMs (57.8–76.0%) and laypeople (47.3–62.4%) was moderate with low variability. Based on the available evidence, the use of SAAs or LLMs should neither be universally recommended nor discouraged; rather, we suggest that their utility should be assessed based on the specific use case and user group under consideration.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"99 1","pages":""},"PeriodicalIF":12.4000,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NPJ Digital Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1038/s41746-025-01566-6","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
摘要
症状评估应用程序(SAA,如英国国家医疗服务系统 111 在线)可帮助非专业人士决定是否就医以及去哪里就医(自我分诊),这种应用程序越来越受欢迎,大语言模型(LLM)的使用也越来越多。然而,目前还没有关于 LLM 准确性的证据综述,也没有综述对 SAA 和 LLM 的准确性进行背景分析。本系统性综述评估了 SAA 和 LLM 的自我分诊准确性,并将其与非专业人士的准确性进行了比较。共筛选出 1549 项研究,并纳入 19 项研究。高级心理咨询师的自我分诊准确率为中等,但变异较大(11.5%-90.0%),而当地健康管理师(57.8%-76.0%)和非专业人员(47.3%-62.4%)的准确率为中等,变异较小。根据现有证据,既不应该普遍推荐使用 SAA 或 LLM,也不应该不鼓励使用;相反,我们建议应根据具体的使用情况和考虑的用户群来评估它们的效用。
Accuracy of online symptom assessment applications, large language models, and laypeople for self–triage decisions
Symptom-Assessment Application (SAAs, e.g., NHS 111 online) that assist laypeople in deciding if and where to seek care (self-triage) are gaining popularity and Large Language Models (LLMs) are increasingly used too. However, there is no evidence synthesis on the accuracy of LLMs, and no review has contextualized the accuracy of SAAs and LLMs. This systematic review evaluates the self-triage accuracy of both SAAs and LLMs and compares them to the accuracy of laypeople. A total of 1549 studies were screened and 19 included. The self-triage accuracy of SAAs was moderate but highly variable (11.5–90.0%), while the accuracy of LLMs (57.8–76.0%) and laypeople (47.3–62.4%) was moderate with low variability. Based on the available evidence, the use of SAAs or LLMs should neither be universally recommended nor discouraged; rather, we suggest that their utility should be assessed based on the specific use case and user group under consideration.
期刊介绍:
npj Digital Medicine is an online open-access journal that focuses on publishing peer-reviewed research in the field of digital medicine. The journal covers various aspects of digital medicine, including the application and implementation of digital and mobile technologies in clinical settings, virtual healthcare, and the use of artificial intelligence and informatics.
The primary goal of the journal is to support innovation and the advancement of healthcare through the integration of new digital and mobile technologies. When determining if a manuscript is suitable for publication, the journal considers four important criteria: novelty, clinical relevance, scientific rigor, and digital innovation.