Think FAST: a novel framework to evaluate fidelity, accuracy, safety, and tone in conversational AI health coach dialogues.

IF 3.2 Q1 HEALTH CARE SCIENCES & SERVICES
Frontiers in digital health Pub Date : 2025-06-18 eCollection Date: 2025-01-01 DOI:10.3389/fdgth.2025.1460236
Martha Neary, Emily Fulton, Victoria Rogers, Julia Wilson, Zoe Griffiths, Ram Chuttani, Paul M Sacher
{"title":"Think FAST: a novel framework to evaluate fidelity, accuracy, safety, and tone in conversational AI health coach dialogues.","authors":"Martha Neary, Emily Fulton, Victoria Rogers, Julia Wilson, Zoe Griffiths, Ram Chuttani, Paul M Sacher","doi":"10.3389/fdgth.2025.1460236","DOIUrl":null,"url":null,"abstract":"<p><p>Developments in Machine Learning based Conversational and Generative Artificial Intelligence (GenAI) have created opportunities for sophisticated Conversational Agents to augment elements of healthcare. While not a replacement for professional care, AI offers opportunities for scalability, cost effectiveness, and automation of many aspects of patient care. However, to realize these opportunities and deliver AI-enabled support safely, interactions between patients and AI must be continuously monitored and evaluated against an agreed upon set of performance criteria. This paper presents one such set of criteria which was developed to evaluate interactions with an AI Health Coach designed to support patients receiving obesity treatment and deployed with an active patient user base. The evaluation framework evolved through an iterative process of development, testing, refining, training, reviewing and supervision. The framework evaluates at both individual message and overall conversation level, rating interactions as Acceptable or Unacceptable in four domains: Fidelity, Accuracy, Safety, and Tone (FAST), with a series of questions to be considered with respect to each domain. Processes to ensure consistent evaluation quality were established and additional patient safety procedures were defined for escalations to healthcare providers based on clinical risk. The framework can be implemented by trained evaluators and offers a method by which healthcare settings deploying AI to support patients can review quality and safety, thus ensuring safe adoption.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1460236"},"PeriodicalIF":3.2000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12216977/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fdgth.2025.1460236","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Developments in Machine Learning based Conversational and Generative Artificial Intelligence (GenAI) have created opportunities for sophisticated Conversational Agents to augment elements of healthcare. While not a replacement for professional care, AI offers opportunities for scalability, cost effectiveness, and automation of many aspects of patient care. However, to realize these opportunities and deliver AI-enabled support safely, interactions between patients and AI must be continuously monitored and evaluated against an agreed upon set of performance criteria. This paper presents one such set of criteria which was developed to evaluate interactions with an AI Health Coach designed to support patients receiving obesity treatment and deployed with an active patient user base. The evaluation framework evolved through an iterative process of development, testing, refining, training, reviewing and supervision. The framework evaluates at both individual message and overall conversation level, rating interactions as Acceptable or Unacceptable in four domains: Fidelity, Accuracy, Safety, and Tone (FAST), with a series of questions to be considered with respect to each domain. Processes to ensure consistent evaluation quality were established and additional patient safety procedures were defined for escalations to healthcare providers based on clinical risk. The framework can be implemented by trained evaluators and offers a method by which healthcare settings deploying AI to support patients can review quality and safety, thus ensuring safe adoption.

快速思考:一个评估AI健康教练对话的保真度、准确性、安全性和语气的新框架。
基于机器学习的会话和生成人工智能(GenAI)的发展为复杂的会话代理创造了增加医疗保健元素的机会。虽然人工智能不能替代专业护理,但它为患者护理的许多方面提供了可扩展性、成本效益和自动化的机会。然而,为了实现这些机会并安全地提供人工智能支持,必须持续监测患者和人工智能之间的互动,并根据商定的一套绩效标准进行评估。本文提出了一套这样的标准,用于评估与人工智能健康教练的互动,该教练旨在支持接受肥胖治疗的患者,并部署了活跃的患者用户群。评价框架经过发展、测试、改进、培训、审查和监督的反复过程而发展。该框架在单个消息和整体会话级别进行评估,在四个领域(保真度、准确性、安全性和语气(FAST))中将交互评为可接受或不可接受,并在每个领域考虑一系列问题。建立了确保一致的评估质量的流程,并根据临床风险为向医疗保健提供者报告制定了额外的患者安全程序。该框架可由训练有素的评估人员实施,并提供一种方法,使部署人工智能以支持患者的医疗机构可以审查质量和安全性,从而确保安全采用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.20
自引率
0.00%
发文量
0
审稿时长
13 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信