Martha Neary, Emily Fulton, Victoria Rogers, Julia Wilson, Zoe Griffiths, Ram Chuttani, Paul M Sacher
{"title":"快速思考:一个评估AI健康教练对话的保真度、准确性、安全性和语气的新框架。","authors":"Martha Neary, Emily Fulton, Victoria Rogers, Julia Wilson, Zoe Griffiths, Ram Chuttani, Paul M Sacher","doi":"10.3389/fdgth.2025.1460236","DOIUrl":null,"url":null,"abstract":"<p><p>Developments in Machine Learning based Conversational and Generative Artificial Intelligence (GenAI) have created opportunities for sophisticated Conversational Agents to augment elements of healthcare. While not a replacement for professional care, AI offers opportunities for scalability, cost effectiveness, and automation of many aspects of patient care. However, to realize these opportunities and deliver AI-enabled support safely, interactions between patients and AI must be continuously monitored and evaluated against an agreed upon set of performance criteria. This paper presents one such set of criteria which was developed to evaluate interactions with an AI Health Coach designed to support patients receiving obesity treatment and deployed with an active patient user base. The evaluation framework evolved through an iterative process of development, testing, refining, training, reviewing and supervision. The framework evaluates at both individual message and overall conversation level, rating interactions as Acceptable or Unacceptable in four domains: Fidelity, Accuracy, Safety, and Tone (FAST), with a series of questions to be considered with respect to each domain. Processes to ensure consistent evaluation quality were established and additional patient safety procedures were defined for escalations to healthcare providers based on clinical risk. The framework can be implemented by trained evaluators and offers a method by which healthcare settings deploying AI to support patients can review quality and safety, thus ensuring safe adoption.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1460236"},"PeriodicalIF":3.2000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12216977/pdf/","citationCount":"0","resultStr":"{\"title\":\"Think FAST: a novel framework to evaluate fidelity, accuracy, safety, and tone in conversational AI health coach dialogues.\",\"authors\":\"Martha Neary, Emily Fulton, Victoria Rogers, Julia Wilson, Zoe Griffiths, Ram Chuttani, Paul M Sacher\",\"doi\":\"10.3389/fdgth.2025.1460236\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Developments in Machine Learning based Conversational and Generative Artificial Intelligence (GenAI) have created opportunities for sophisticated Conversational Agents to augment elements of healthcare. While not a replacement for professional care, AI offers opportunities for scalability, cost effectiveness, and automation of many aspects of patient care. However, to realize these opportunities and deliver AI-enabled support safely, interactions between patients and AI must be continuously monitored and evaluated against an agreed upon set of performance criteria. This paper presents one such set of criteria which was developed to evaluate interactions with an AI Health Coach designed to support patients receiving obesity treatment and deployed with an active patient user base. The evaluation framework evolved through an iterative process of development, testing, refining, training, reviewing and supervision. The framework evaluates at both individual message and overall conversation level, rating interactions as Acceptable or Unacceptable in four domains: Fidelity, Accuracy, Safety, and Tone (FAST), with a series of questions to be considered with respect to each domain. Processes to ensure consistent evaluation quality were established and additional patient safety procedures were defined for escalations to healthcare providers based on clinical risk. The framework can be implemented by trained evaluators and offers a method by which healthcare settings deploying AI to support patients can review quality and safety, thus ensuring safe adoption.</p>\",\"PeriodicalId\":73078,\"journal\":{\"name\":\"Frontiers in digital health\",\"volume\":\"7 \",\"pages\":\"1460236\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12216977/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in digital health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fdgth.2025.1460236\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fdgth.2025.1460236","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
Think FAST: a novel framework to evaluate fidelity, accuracy, safety, and tone in conversational AI health coach dialogues.
Developments in Machine Learning based Conversational and Generative Artificial Intelligence (GenAI) have created opportunities for sophisticated Conversational Agents to augment elements of healthcare. While not a replacement for professional care, AI offers opportunities for scalability, cost effectiveness, and automation of many aspects of patient care. However, to realize these opportunities and deliver AI-enabled support safely, interactions between patients and AI must be continuously monitored and evaluated against an agreed upon set of performance criteria. This paper presents one such set of criteria which was developed to evaluate interactions with an AI Health Coach designed to support patients receiving obesity treatment and deployed with an active patient user base. The evaluation framework evolved through an iterative process of development, testing, refining, training, reviewing and supervision. The framework evaluates at both individual message and overall conversation level, rating interactions as Acceptable or Unacceptable in four domains: Fidelity, Accuracy, Safety, and Tone (FAST), with a series of questions to be considered with respect to each domain. Processes to ensure consistent evaluation quality were established and additional patient safety procedures were defined for escalations to healthcare providers based on clinical risk. The framework can be implemented by trained evaluators and offers a method by which healthcare settings deploying AI to support patients can review quality and safety, thus ensuring safe adoption.