快速思考：一个评估AI健康教练对话的保真度、准确性、安全性和语气的新框架。

IF 3.2 Q1 HEALTH CARE SCIENCES & SERVICES

Frontiers in digital health Pub Date : 2025-06-18 eCollection Date: 2025-01-01 DOI:10.3389/fdgth.2025.1460236

Martha Neary, Emily Fulton, Victoria Rogers, Julia Wilson, Zoe Griffiths, Ram Chuttani, Paul M Sacher

{"title":"快速思考：一个评估AI健康教练对话的保真度、准确性、安全性和语气的新框架。","authors":"Martha Neary, Emily Fulton, Victoria Rogers, Julia Wilson, Zoe Griffiths, Ram Chuttani, Paul M Sacher","doi":"10.3389/fdgth.2025.1460236","DOIUrl":null,"url":null,"abstract":"Developments in Machine Learning based Conversational and Generative Artificial Intelligence (GenAI) have created opportunities for sophisticated Conversational Agents to augment elements of healthcare. While not a replacement for professional care, AI offers opportunities for scalability, cost effectiveness, and automation of many aspects of patient care. However, to realize these opportunities and deliver AI-enabled support safely, interactions between patients and AI must be continuously monitored and evaluated against an agreed upon set of performance criteria. This paper presents one such set of criteria which was developed to evaluate interactions with an AI Health Coach designed to support patients receiving obesity treatment and deployed with an active patient user base. The evaluation framework evolved through an iterative process of development, testing, refining, training, reviewing and supervision. The framework evaluates at both individual message and overall conversation level, rating interactions as Acceptable or Unacceptable in four domains: Fidelity, Accuracy, Safety, and Tone (FAST), with a series of questions to be considered with respect to each domain. Processes to ensure consistent evaluation quality were established and additional patient safety procedures were defined for escalations to healthcare providers based on clinical risk. The framework can be implemented by trained evaluators and offers a method by which healthcare settings deploying AI to support patients can review quality and safety, thus ensuring safe adoption.","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1460236"},"PeriodicalIF":3.2000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12216977/pdf/","citationCount":"0","resultStr":"{\"title\":\"Think FAST: a novel framework to evaluate fidelity, accuracy, safety, and tone in conversational AI health coach dialogues.\",\"authors\":\"Martha Neary, Emily Fulton, Victoria Rogers, Julia Wilson, Zoe Griffiths, Ram Chuttani, Paul M Sacher\",\"doi\":\"10.3389/fdgth.2025.1460236\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Developments in Machine Learning based Conversational and Generative Artificial Intelligence (GenAI) have created opportunities for sophisticated Conversational Agents to augment elements of healthcare. While not a replacement for professional care, AI offers opportunities for scalability, cost effectiveness, and automation of many aspects of patient care. However, to realize these opportunities and deliver AI-enabled support safely, interactions between patients and AI must be continuously monitored and evaluated against an agreed upon set of performance criteria. This paper presents one such set of criteria which was developed to evaluate interactions with an AI Health Coach designed to support patients receiving obesity treatment and deployed with an active patient user base. The evaluation framework evolved through an iterative process of development, testing, refining, training, reviewing and supervision. The framework evaluates at both individual message and overall conversation level, rating interactions as Acceptable or Unacceptable in four domains: Fidelity, Accuracy, Safety, and Tone (FAST), with a series of questions to be considered with respect to each domain. Processes to ensure consistent evaluation quality were established and additional patient safety procedures were defined for escalations to healthcare providers based on clinical risk. The framework can be implemented by trained evaluators and offers a method by which healthcare settings deploying AI to support patients can review quality and safety, thus ensuring safe adoption.\",\"PeriodicalId\":73078,\"journal\":{\"name\":\"Frontiers in digital health\",\"volume\":\"7 \",\"pages\":\"1460236\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12216977/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in digital health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fdgth.2025.1460236\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fdgth.2025.1460236","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

摘要

基于机器学习的会话和生成人工智能（GenAI）的发展为复杂的会话代理创造了增加医疗保健元素的机会。虽然人工智能不能替代专业护理，但它为患者护理的许多方面提供了可扩展性、成本效益和自动化的机会。然而，为了实现这些机会并安全地提供人工智能支持，必须持续监测患者和人工智能之间的互动，并根据商定的一套绩效标准进行评估。本文提出了一套这样的标准，用于评估与人工智能健康教练的互动，该教练旨在支持接受肥胖治疗的患者，并部署了活跃的患者用户群。评价框架经过发展、测试、改进、培训、审查和监督的反复过程而发展。该框架在单个消息和整体会话级别进行评估，在四个领域（保真度、准确性、安全性和语气（FAST））中将交互评为可接受或不可接受，并在每个领域考虑一系列问题。建立了确保一致的评估质量的流程，并根据临床风险为向医疗保健提供者报告制定了额外的患者安全程序。该框架可由训练有素的评估人员实施，并提供一种方法，使部署人工智能以支持患者的医疗机构可以审查质量和安全性，从而确保安全采用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Think FAST: a novel framework to evaluate fidelity, accuracy, safety, and tone in conversational AI health coach dialogues.

Developments in Machine Learning based Conversational and Generative Artificial Intelligence (GenAI) have created opportunities for sophisticated Conversational Agents to augment elements of healthcare. While not a replacement for professional care, AI offers opportunities for scalability, cost effectiveness, and automation of many aspects of patient care. However, to realize these opportunities and deliver AI-enabled support safely, interactions between patients and AI must be continuously monitored and evaluated against an agreed upon set of performance criteria. This paper presents one such set of criteria which was developed to evaluate interactions with an AI Health Coach designed to support patients receiving obesity treatment and deployed with an active patient user base. The evaluation framework evolved through an iterative process of development, testing, refining, training, reviewing and supervision. The framework evaluates at both individual message and overall conversation level, rating interactions as Acceptable or Unacceptable in four domains: Fidelity, Accuracy, Safety, and Tone (FAST), with a series of questions to be considered with respect to each domain. Processes to ensure consistent evaluation quality were established and additional patient safety procedures were defined for escalations to healthcare providers based on clinical risk. The framework can be implemented by trained evaluators and offers a method by which healthcare settings deploying AI to support patients can review quality and safety, thus ensuring safe adoption.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Frontiers in digital health

CiteScore

4.20

自引率

0.00%

发文量

审稿时长

13 weeks