Large language model answers medical questions about standard pathology reports

IF 3.1 3区 医学 Q1 MEDICINE, GENERAL & INTERNAL
Anqi Wang, Jieli Zhou, Peng Zhang, Haotian Cao, Hongyi Xin, Xinyun Xu, Haiyang Zhou
{"title":"Large language model answers medical questions about standard pathology reports","authors":"Anqi Wang, Jieli Zhou, Peng Zhang, Haotian Cao, Hongyi Xin, Xinyun Xu, Haiyang Zhou","doi":"10.3389/fmed.2024.1402457","DOIUrl":null,"url":null,"abstract":"This study aims to evaluate the feasibility of large language model (LLM) in answering pathology questions based on pathology reports (PRs) of colorectal cancer (CRC). Four common questions (CQs) and corresponding answers about pathology were retrieved from public webpages. These questions were input as prompts for Chat Generative Pretrained Transformer (ChatGPT) (gpt-3.5-turbo). The quality indicators (understanding, scientificity, satisfaction) of all answers were evaluated by gastroenterologists. Standard PRs from 5 CRC patients who received radical surgeries in Shanghai Changzheng Hospital were selected. Six report questions (RQs) and corresponding answers were generated by a gastroenterologist and a pathologist. We developed an interactive PRs interpretation system which allows users to upload standard PRs as JPG images. Then the ChatGPT's responses to the RQs were generated. The quality indicators of all answers were evaluated by gastroenterologists and out-patients. As for CQs, gastroenterologists rated AI answers similarly to non-AI answers in understanding, scientificity, and satisfaction. As for RQ1-3, gastroenterologists and patients rated the AI mean scores higher than non-AI scores among the quality indicators. However, as for RQ4-6, gastroenterologists rated the AI mean scores lower than non-AI scores in understanding and satisfaction. In RQ4, gastroenterologists rated the AI scores lower than non-AI scores in scientificity (<jats:italic>P</jats:italic> = 0.011); patients rated the AI scores lower than non-AI scores in understanding (<jats:italic>P</jats:italic> = 0.004) and satisfaction (<jats:italic>P</jats:italic> = 0.011). In conclusion, LLM could generate credible answers to common pathology questions and conceptual questions on the PRs. It holds great potential in improving doctor-patient communication.","PeriodicalId":12488,"journal":{"name":"Frontiers in Medicine","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fmed.2024.1402457","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

Abstract

This study aims to evaluate the feasibility of large language model (LLM) in answering pathology questions based on pathology reports (PRs) of colorectal cancer (CRC). Four common questions (CQs) and corresponding answers about pathology were retrieved from public webpages. These questions were input as prompts for Chat Generative Pretrained Transformer (ChatGPT) (gpt-3.5-turbo). The quality indicators (understanding, scientificity, satisfaction) of all answers were evaluated by gastroenterologists. Standard PRs from 5 CRC patients who received radical surgeries in Shanghai Changzheng Hospital were selected. Six report questions (RQs) and corresponding answers were generated by a gastroenterologist and a pathologist. We developed an interactive PRs interpretation system which allows users to upload standard PRs as JPG images. Then the ChatGPT's responses to the RQs were generated. The quality indicators of all answers were evaluated by gastroenterologists and out-patients. As for CQs, gastroenterologists rated AI answers similarly to non-AI answers in understanding, scientificity, and satisfaction. As for RQ1-3, gastroenterologists and patients rated the AI mean scores higher than non-AI scores among the quality indicators. However, as for RQ4-6, gastroenterologists rated the AI mean scores lower than non-AI scores in understanding and satisfaction. In RQ4, gastroenterologists rated the AI scores lower than non-AI scores in scientificity (P = 0.011); patients rated the AI scores lower than non-AI scores in understanding (P = 0.004) and satisfaction (P = 0.011). In conclusion, LLM could generate credible answers to common pathology questions and conceptual questions on the PRs. It holds great potential in improving doctor-patient communication.
大型语言模型解答有关标准病理报告的医学问题
本研究旨在评估大语言模型(LLM)根据结直肠癌(CRC)病理报告(PRs)回答病理问题的可行性。研究人员从公共网页中检索了四个常见病理问题(CQ)和相应的答案。这些问题作为提示输入到聊天生成预训练转换器(ChatGPT)(gpt-3.5-turbo)中。所有答案的质量指标(理解度、科学性、满意度)均由消化科医生进行评估。选取了在上海长征医院接受根治性手术的 5 名 CRC 患者的标准 PR。由一名消化内科医生和一名病理科医生生成六个报告问题(RQ)和相应的答案。我们开发了一个交互式PRs解读系统,用户可以上传标准PRs的JPG图片。然后生成 ChatGPT 对 RQs 的回答。所有答案的质量指标均由消化科医生和门诊患者进行评估。对于 CQ,消化科医生对人工智能回答的理解度、科学性和满意度的评价与非人工智能回答相似。至于 RQ1-3,在各项质量指标中,胃肠病学家和患者对人工智能平均分的评分均高于非人工智能得分。然而,对于问题 4-6,在理解度和满意度方面,胃肠病学家对人工智能平均分的评分低于非人工智能评分。在 RQ4 中,胃肠病医师对人工智能的科学性评分低于非人工智能评分(P = 0.011);患者对人工智能的理解度评分(P = 0.004)和满意度评分(P = 0.011)低于非人工智能评分。总之,LLM 可以为常见病理学问题和 PR 概念性问题生成可信的答案。它在改善医患沟通方面具有巨大潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Frontiers in Medicine
Frontiers in Medicine Medicine-General Medicine
CiteScore
5.10
自引率
5.10%
发文量
3710
审稿时长
12 weeks
期刊介绍: Frontiers in Medicine publishes rigorously peer-reviewed research linking basic research to clinical practice and patient care, as well as translating scientific advances into new therapies and diagnostic tools. Led by an outstanding Editorial Board of international experts, this multidisciplinary open-access journal is at the forefront of disseminating and communicating scientific knowledge and impactful discoveries to researchers, academics, clinicians and the public worldwide. In addition to papers that provide a link between basic research and clinical practice, a particular emphasis is given to studies that are directly relevant to patient care. In this spirit, the journal publishes the latest research results and medical knowledge that facilitate the translation of scientific advances into new therapies or diagnostic tools. The full listing of the Specialty Sections represented by Frontiers in Medicine is as listed below. As well as the established medical disciplines, Frontiers in Medicine is launching new sections that together will facilitate - the use of patient-reported outcomes under real world conditions - the exploitation of big data and the use of novel information and communication tools in the assessment of new medicines - the scientific bases for guidelines and decisions from regulatory authorities - access to medicinal products and medical devices worldwide - addressing the grand health challenges around the world
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信