Lara Shemtob, Abdullah Nouri, Adam Harvey-Sullivan, Connor S Qiu, Jonathan Martin, Martha Martin, Sara Noden, Tanveer Rob, Ana L Neves, Azeem Majeed, Jonathan Clarke, Thomas Beaney
{"title":"Comparing artificial intelligence- vs clinician-authored summaries of simulated primary care electronic health records.","authors":"Lara Shemtob, Abdullah Nouri, Adam Harvey-Sullivan, Connor S Qiu, Jonathan Martin, Martha Martin, Sara Noden, Tanveer Rob, Ana L Neves, Azeem Majeed, Jonathan Clarke, Thomas Beaney","doi":"10.1093/jamiaopen/ooaf082","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To compare clinical summaries generated from simulated patient primary care electronic health records (EHRs) by GPT-4, to summaries generated by clinicians on multiple domains of quality including utility, concision, accuracy, and bias.</p><p><strong>Materials and methods: </strong>Seven primary care physicians generated 70 simulated patient EHR notes, each representing 10 patient contacts with the practice over at least 2 years. Each record was summarized by a different clinician and by GPT-4. artificial intelligence (AI)- and clinician-authored summaries were rated blind by clinicians according to 8 domains of quality and an overall rating.</p><p><strong>Results: </strong>The median time taken for a clinician to read through and assimilate the information in the EHRs before summarizing, was 7 minutes. Clinicians rated clinician-authored summaries higher than AI-authored summaries overall (7.39 vs 7.00 out of 10; <i>P</i> = .02), but with greater variability in clinician-authored summary ratings. AI and clinician-authored summaries had similar accuracy and AI-authored summaries were less likely to omit important information and more likely to use patient-friendly language.</p><p><strong>Discussion: </strong>Although AI-authored summaries were rated slightly lower overall compared with clinician-authored summaries, they demonstrated similar accuracy and greater consistency. This demonstrates potential applications for generating summaries in primary care, particularly given the substantial time taken for clinicians to undertake this work.</p><p><strong>Conclusion: </strong>The results suggest the feasibility, utility and acceptability of using AI-authored summaries to integrate into EHRs to support clinicians in primary care. AI summarization tools have the potential to improve healthcare productivity, including by enabling clinicians to spend more time on direct patient care.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 4","pages":"ooaf082"},"PeriodicalIF":3.4000,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12309840/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMIA Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooaf082","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: To compare clinical summaries generated from simulated patient primary care electronic health records (EHRs) by GPT-4, to summaries generated by clinicians on multiple domains of quality including utility, concision, accuracy, and bias.
Materials and methods: Seven primary care physicians generated 70 simulated patient EHR notes, each representing 10 patient contacts with the practice over at least 2 years. Each record was summarized by a different clinician and by GPT-4. artificial intelligence (AI)- and clinician-authored summaries were rated blind by clinicians according to 8 domains of quality and an overall rating.
Results: The median time taken for a clinician to read through and assimilate the information in the EHRs before summarizing, was 7 minutes. Clinicians rated clinician-authored summaries higher than AI-authored summaries overall (7.39 vs 7.00 out of 10; P = .02), but with greater variability in clinician-authored summary ratings. AI and clinician-authored summaries had similar accuracy and AI-authored summaries were less likely to omit important information and more likely to use patient-friendly language.
Discussion: Although AI-authored summaries were rated slightly lower overall compared with clinician-authored summaries, they demonstrated similar accuracy and greater consistency. This demonstrates potential applications for generating summaries in primary care, particularly given the substantial time taken for clinicians to undertake this work.
Conclusion: The results suggest the feasibility, utility and acceptability of using AI-authored summaries to integrate into EHRs to support clinicians in primary care. AI summarization tools have the potential to improve healthcare productivity, including by enabling clinicians to spend more time on direct patient care.
目的:比较GPT-4从模拟患者初级保健电子健康记录(EHRs)生成的临床摘要与临床医生在多个质量领域生成的摘要,包括实用性、简洁性、准确性和偏倚。材料和方法:7名初级保健医生生成70个模拟患者电子病历记录,每个记录代表10名患者在至少2年内与该诊所接触。每个记录由不同的临床医生和GPT-4进行总结。临床医生根据8个质量领域和总体评级,将人工智能(AI)和临床医生撰写的摘要评为盲摘要。结果:临床医生在总结之前通读和吸收电子病历信息的中位时间为7分钟。临床医生认为临床医生撰写的摘要总体上高于人工智能撰写的摘要(7.39 vs 7.00;P = .02),但临床撰写的总结评分差异较大。人工智能和临床医生撰写的摘要具有相似的准确性,人工智能撰写的摘要不太可能遗漏重要信息,更有可能使用对患者友好的语言。讨论:尽管与临床医生撰写的摘要相比,人工智能撰写的摘要的总体评分略低,但它们表现出相似的准确性和更高的一致性。这证明了在初级保健中生成摘要的潜在应用,特别是考虑到临床医生需要花费大量时间来进行这项工作。结论:研究结果表明,将人工智能撰写的摘要整合到电子病历中,以支持临床医生进行初级保健,具有可行性、实用性和可接受性。人工智能总结工具有可能提高医疗保健工作效率,包括使临床医生能够将更多时间用于直接护理患者。