Katherine R Whitehouse, Ryan T Heslin, Alexis Desir, Rajam Raghunathan, Ana K Islam, Sarah Lallky, Priscilla Philip, Nicole Reedy, Ankeeta Mehta, Megan Parmer, Marlen V Piersall, Sarah C Oltmann, Alan P B Dackiw, Naim M Maalouf, Vivek R Sant
{"title":"Easing the burden: A pilot study evaluating AI-generated In-Basket message drafts to streamline perioperative endocrine surgical care.","authors":"Katherine R Whitehouse, Ryan T Heslin, Alexis Desir, Rajam Raghunathan, Ana K Islam, Sarah Lallky, Priscilla Philip, Nicole Reedy, Ankeeta Mehta, Megan Parmer, Marlen V Piersall, Sarah C Oltmann, Alan P B Dackiw, Naim M Maalouf, Vivek R Sant","doi":"10.1016/j.surg.2025.109700","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence can generate accurate and empathetic responses to patient questions about endocrine diseases, but its ability to reduce health care provider (HCP) clinical burden remains unmeasured. We evaluated how artificial intelligence-generated draft messages impact health care provider workload in answering common perioperative endocrine surgery questions.</p><p><strong>Methods: </strong>Health care providers completed a timed survey responding to 20 randomized perioperative endocrine patient In-Basket messages, 10 with blank drafts and 10 with artificial intelligence-generated drafts using GPT-4. Participants could use some, all, or none of the provided draft. Response times were recorded, and text similarity ratio (1.0 = identical) was used to measure the extent of editing. Cognitive load was assessed using raw NASA Task Load Index, and provider satisfaction was elicited.</p><p><strong>Results: </strong>11 health care providers participated. Response time averaged 49 ± 67 seconds per question with artificial intelligence drafts vs 137 ± 115 seconds with a blank draft (P < .001). Artificial intelligence-generated drafts required minimal edits (similarity ratio 0.88 ± 0.24). Mean Task Load Index was lower with artificial intelligence drafts (3.2 ± 1.5 vs 4.4 ± 1.3, P < .001), particularly in mental (P = .02) and temporal (P < .01) demand. Frustration levels were similar (P = .49). Sixty-four percent of respondents were satisfied or extremely satisfied, and 82% wished to integrate this tool into clinical practice. Feedback highlighted the importance of personalization and its usefulness for questions with expected routine responses.</p><p><strong>Conclusion: </strong>Artificial intelligence-generated draft responses to common perioperative patient questions reduced health care provider response time and cognitive load, required minimal edits, and were associated with enhanced health care provider satisfaction and minimal frustration.</p>","PeriodicalId":22152,"journal":{"name":"Surgery","volume":" ","pages":"109700"},"PeriodicalIF":2.7000,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.surg.2025.109700","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Artificial intelligence can generate accurate and empathetic responses to patient questions about endocrine diseases, but its ability to reduce health care provider (HCP) clinical burden remains unmeasured. We evaluated how artificial intelligence-generated draft messages impact health care provider workload in answering common perioperative endocrine surgery questions.
Methods: Health care providers completed a timed survey responding to 20 randomized perioperative endocrine patient In-Basket messages, 10 with blank drafts and 10 with artificial intelligence-generated drafts using GPT-4. Participants could use some, all, or none of the provided draft. Response times were recorded, and text similarity ratio (1.0 = identical) was used to measure the extent of editing. Cognitive load was assessed using raw NASA Task Load Index, and provider satisfaction was elicited.
Results: 11 health care providers participated. Response time averaged 49 ± 67 seconds per question with artificial intelligence drafts vs 137 ± 115 seconds with a blank draft (P < .001). Artificial intelligence-generated drafts required minimal edits (similarity ratio 0.88 ± 0.24). Mean Task Load Index was lower with artificial intelligence drafts (3.2 ± 1.5 vs 4.4 ± 1.3, P < .001), particularly in mental (P = .02) and temporal (P < .01) demand. Frustration levels were similar (P = .49). Sixty-four percent of respondents were satisfied or extremely satisfied, and 82% wished to integrate this tool into clinical practice. Feedback highlighted the importance of personalization and its usefulness for questions with expected routine responses.
Conclusion: Artificial intelligence-generated draft responses to common perioperative patient questions reduced health care provider response time and cognitive load, required minimal edits, and were associated with enhanced health care provider satisfaction and minimal frustration.
期刊介绍:
For 66 years, Surgery has published practical, authoritative information about procedures, clinical advances, and major trends shaping general surgery. Each issue features original scientific contributions and clinical reports. Peer-reviewed articles cover topics in oncology, trauma, gastrointestinal, vascular, and transplantation surgery. The journal also publishes papers from the meetings of its sponsoring societies, the Society of University Surgeons, the Central Surgical Association, and the American Association of Endocrine Surgeons.