Using ChatGPT in a clinical setting: A case report

MedComm - Future medicine Pub Date : 2023-06-21 DOI:10.1002/mef2.51

Yongqin Ye, Shuvam Sarkar, Anand Bhaskar, Brian Tomlinson, Olivia Monteiro

{"title":"Using ChatGPT in a clinical setting: A case report","authors":"Yongqin Ye, Shuvam Sarkar, Anand Bhaskar, Brian Tomlinson, Olivia Monteiro","doi":"10.1002/mef2.51","DOIUrl":null,"url":null,"abstract":"Large language models (LLMs) are rapidly becoming an important foundation model that has infiltrated our daily lives in many ways. The release of GPT-3 and GPT-4, a LLM that is capable of natural language processing (NLP) that has been trained on terabytes of text data through transfer learning to apply knowledge gained from a previous task to solve a different but related problem, immediately captured the attention of the medical field to investigate how LLMs can be used to process and interpret electronic health records and to streamline clinical writing.1 NLP models have traditionally been used mainly as diagnostic aids in healthcare. Its use generally requires supervised learning on manually labeled and training datasets with a huge involvement of time from healthcare professionals.2 NLP models often lack precision, accuracy and mostly only accessible by the developers. Recent LLMs with their transformer and reinforcement learning with human feedback, have enabled better precision in text generation. The advancement of GPT-3 (Generative Pre-Trained Transformer, commonly known as ChatGPT) demonstrated that LLMs can rapidly adapt to new tasks resulting in better generalization. Also, ChatGPT has a simple interface, which has enabled broad adoption and use. Having such a versatile and user-friendly tool at our fingertips means that we can adapt to use LLMs for basic tasks such as generating clinical reports, providing clinical support, or to synthesize patient data from multiple sources.We have used this case report as an opportunity to demonstrate the practicality of ChatGPT in basic writing tasks in a clinical context. This case report is obtained from two teaching videos uploaded by TTMedcastTraining Texas Tech University on YouTube. The two videos are of a patient called Jonathan who presented with bilateral knee pain with a history of sickle cell disease. One video is the bedside presentation of the patient by a medical intern, another is a group discussion of treatment plans for this patient. Since GPT-3 can only deal with text input, we have downloaded the transcript from each video. The transcripts sometimes contain people talking at the same time, filler words, mispronounced words, or incomplete sentences. Unaltered transcripts were submitted to ChatGPT separately for interpretation.The workflow of using ChatGPT to generate the case report is summarized in Figure 1. We fed the transcript of Video 1 into ChatGPT and asked it to write a case report from it (Case Report 1). Then, we used the transcript of Video 2 to create Case Report 2. ChatGPT was asked to combine the two reports without summarizing and offer a diagnosis and a treatment plan. We also asked ChatGPT to write the final case report in the style for the New England Journal of Medicine. This process took around 1.5 h, including time the authors spent watching the videos. The full case report is found in Supporting Information, ChatGPT's Case Report.The first author, an attending physician in pediatric surgery, was also asked to study the videos and write a case report based on the two videos. Due to a heavy workload and varied schedule from his work, it took several attempts and in total roughly 4 h to complete. This report is found in Supporting Information, Physician's Case Report. He was also asked if he agreed with the diagnosis and treatment plan provided by ChatGPT. After careful research on sickle cell disease and its presentation, the first author agreed with the diagnosis and treatment plans.To compare the quality of writing, we have asked three physicians to rate the two case reports according to a modified version of The Joanna Briggs Institute Critical Appraisal Checklist for Case Reports.3 All three physicians gave the two reports similar scores of 5.7/8 and 6/8 for ChatGPT's and physician's report respectively. The most common comment regarding ChatGPT's report is the lack of the patient's past medical history while this information is more evident in the physician's report. Detailed comments are found in Supporting Information.This study is a good example of how ChatGPT can be used effectively to perform simple writing tasks in a clinical setting when clinicians record their cases to write up later and how easy it is to use ChatGPT to synthesize data from multiple sources and to provide medical support. However, before ChatGPT can be used in day-to-day practice, several important points need to be raised. ChatGPT requires very detailed and specific information to write the final case report (the chat history detailing the prompts and the separate case reports generated is available in Supplementary Materials). Although the first report was not perfect, it is possible to refine the report by requesting ChatGPT to incorporate additional information. With practice, users will be familiarized with prompt strategies that will generate the desired outputs. In this case report, our prompts were clear and simple. We specified that the original text was a “transcript from a video” so that the format of the text input was understood. We employed strong and meaningful prompts such as “write,” “incorporate” and “combine” and qualified our request to “not summarize” the information given. In addition, we wanted to generate a case report in a format that most clinicians are familiar with reading. To this end, we prompted with the specific output format of “New England Journal of Medicine.”One important point to highlight is that ChatGPT can be used to interpret incomplete medical data by offering diagnostic assessments and treatment plans based on medical history and patient symptoms. We believe that ChatGPT will be a useful assistant that can be used to complete time-consuming writing tasks, even when clinical data was partially missing. It is important to note that ChatGPT can provide ideas, but it cannot replace a clinician's knowledge. In-depth knowledge of their cases and the diseases being treated is absolutely necessary to ensure credibility of generated text and to avoid misinterpretation from the LLM. Clinicians must dedicate time to fact check generated reports soon after attending a case to avoid confusing generated material with actual medical records.4 It remains the sole responsibility of the attending clinician to ensure patient data accuracy and privacy, and that they make the final diagnostic and treatment decisions to avoid any legal and ethical issues that may arise from incorrectly LLM-generated patient data.5With the advancement of fine-tuned medical LLMs such as GatorTron (a clinical LLM that is trained on deidentified clinical text to process and interpret electronic health records) and Google's Med-PaLM 2 (LLM that is billed to more accurately and safely answer medical questions), we can expect to see much more AI-generated text in every area of medicine. We must, however, remember that medicine is an interpersonal process, and the interaction and communication between doctor and patient remains irreplaceable. We can use LLMs to enhance the ability of practitioners to improve accuracy and precision whilst empowering patients with greater autonomy, but we must be mindful of ethical considerations of the integration of LLMs in medical practice.In conclusion, ChatGPT has the potential to be a valuable tool for clinicians in generating case reports in a timely manner. However, its use requires practice and willingness to dedicate time to verify the accuracy of generated data using a clinician's expertise and knowledge.Yongqin Ye: Data curation (equal); formal analysis (equal); methodology (equal); validation (equal); writing—review and editing (equal). Shuvam Sarkar: Data curation (equal); formal analysis (equal); validation (equal); writing—review and editing (equal). Anand Bhaskar: Data curation (equal); formal analysis (equal); validation (equal); writing—review and editing (equal). Brian Tomlinson: Data curation (equal); formal analysis (equal); validation (equal); writing—review and editing (equal). Olivia Monteiro: Conceptualization (equal); data curation (equal); formal analysis (equal); methodology (equal); project administration (equal); resources (equal); software (equal); supervision (equal); validation (equal); writing—original draft (equal); writing—review and editing (equal). All authors have read and approved the final manuscript.The authors declare no conflicts of interest.Not applicable.","PeriodicalId":74135,"journal":{"name":"MedComm - Future medicine","volume":"2 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/mef2.51","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MedComm - Future medicine","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/mef2.51","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Large language models (LLMs) are rapidly becoming an important foundation model that has infiltrated our daily lives in many ways. The release of GPT-3 and GPT-4, a LLM that is capable of natural language processing (NLP) that has been trained on terabytes of text data through transfer learning to apply knowledge gained from a previous task to solve a different but related problem, immediately captured the attention of the medical field to investigate how LLMs can be used to process and interpret electronic health records and to streamline clinical writing.¹ NLP models have traditionally been used mainly as diagnostic aids in healthcare. Its use generally requires supervised learning on manually labeled and training datasets with a huge involvement of time from healthcare professionals.² NLP models often lack precision, accuracy and mostly only accessible by the developers. Recent LLMs with their transformer and reinforcement learning with human feedback, have enabled better precision in text generation. The advancement of GPT-3 (Generative Pre-Trained Transformer, commonly known as ChatGPT) demonstrated that LLMs can rapidly adapt to new tasks resulting in better generalization. Also, ChatGPT has a simple interface, which has enabled broad adoption and use. Having such a versatile and user-friendly tool at our fingertips means that we can adapt to use LLMs for basic tasks such as generating clinical reports, providing clinical support, or to synthesize patient data from multiple sources.

We have used this case report as an opportunity to demonstrate the practicality of ChatGPT in basic writing tasks in a clinical context. This case report is obtained from two teaching videos uploaded by TTMedcastTraining Texas Tech University on YouTube. The two videos are of a patient called Jonathan who presented with bilateral knee pain with a history of sickle cell disease. One video is the bedside presentation of the patient by a medical intern, another is a group discussion of treatment plans for this patient. Since GPT-3 can only deal with text input, we have downloaded the transcript from each video. The transcripts sometimes contain people talking at the same time, filler words, mispronounced words, or incomplete sentences. Unaltered transcripts were submitted to ChatGPT separately for interpretation.

The workflow of using ChatGPT to generate the case report is summarized in Figure 1. We fed the transcript of Video 1 into ChatGPT and asked it to write a case report from it (Case Report 1). Then, we used the transcript of Video 2 to create Case Report 2. ChatGPT was asked to combine the two reports without summarizing and offer a diagnosis and a treatment plan. We also asked ChatGPT to write the final case report in the style for the New England Journal of Medicine. This process took around 1.5 h, including time the authors spent watching the videos. The full case report is found in Supporting Information, ChatGPT's Case Report.

The first author, an attending physician in pediatric surgery, was also asked to study the videos and write a case report based on the two videos. Due to a heavy workload and varied schedule from his work, it took several attempts and in total roughly 4 h to complete. This report is found in Supporting Information, Physician's Case Report. He was also asked if he agreed with the diagnosis and treatment plan provided by ChatGPT. After careful research on sickle cell disease and its presentation, the first author agreed with the diagnosis and treatment plans.

To compare the quality of writing, we have asked three physicians to rate the two case reports according to a modified version of The Joanna Briggs Institute Critical Appraisal Checklist for Case Reports.³ All three physicians gave the two reports similar scores of 5.7/8 and 6/8 for ChatGPT's and physician's report respectively. The most common comment regarding ChatGPT's report is the lack of the patient's past medical history while this information is more evident in the physician's report. Detailed comments are found in Supporting Information.

This study is a good example of how ChatGPT can be used effectively to perform simple writing tasks in a clinical setting when clinicians record their cases to write up later and how easy it is to use ChatGPT to synthesize data from multiple sources and to provide medical support. However, before ChatGPT can be used in day-to-day practice, several important points need to be raised. ChatGPT requires very detailed and specific information to write the final case report (the chat history detailing the prompts and the separate case reports generated is available in Supplementary Materials). Although the first report was not perfect, it is possible to refine the report by requesting ChatGPT to incorporate additional information. With practice, users will be familiarized with prompt strategies that will generate the desired outputs. In this case report, our prompts were clear and simple. We specified that the original text was a “transcript from a video” so that the format of the text input was understood. We employed strong and meaningful prompts such as “write,” “incorporate” and “combine” and qualified our request to “not summarize” the information given. In addition, we wanted to generate a case report in a format that most clinicians are familiar with reading. To this end, we prompted with the specific output format of “New England Journal of Medicine.”

One important point to highlight is that ChatGPT can be used to interpret incomplete medical data by offering diagnostic assessments and treatment plans based on medical history and patient symptoms. We believe that ChatGPT will be a useful assistant that can be used to complete time-consuming writing tasks, even when clinical data was partially missing. It is important to note that ChatGPT can provide ideas, but it cannot replace a clinician's knowledge. In-depth knowledge of their cases and the diseases being treated is absolutely necessary to ensure credibility of generated text and to avoid misinterpretation from the LLM. Clinicians must dedicate time to fact check generated reports soon after attending a case to avoid confusing generated material with actual medical records.⁴ It remains the sole responsibility of the attending clinician to ensure patient data accuracy and privacy, and that they make the final diagnostic and treatment decisions to avoid any legal and ethical issues that may arise from incorrectly LLM-generated patient data.⁵

With the advancement of fine-tuned medical LLMs such as GatorTron (a clinical LLM that is trained on deidentified clinical text to process and interpret electronic health records) and Google's Med-PaLM 2 (LLM that is billed to more accurately and safely answer medical questions), we can expect to see much more AI-generated text in every area of medicine. We must, however, remember that medicine is an interpersonal process, and the interaction and communication between doctor and patient remains irreplaceable. We can use LLMs to enhance the ability of practitioners to improve accuracy and precision whilst empowering patients with greater autonomy, but we must be mindful of ethical considerations of the integration of LLMs in medical practice.

In conclusion, ChatGPT has the potential to be a valuable tool for clinicians in generating case reports in a timely manner. However, its use requires practice and willingness to dedicate time to verify the accuracy of generated data using a clinician's expertise and knowledge.

Yongqin Ye: Data curation (equal); formal analysis (equal); methodology (equal); validation (equal); writing—review and editing (equal). Shuvam Sarkar: Data curation (equal); formal analysis (equal); validation (equal); writing—review and editing (equal). Anand Bhaskar: Data curation (equal); formal analysis (equal); validation (equal); writing—review and editing (equal). Brian Tomlinson: Data curation (equal); formal analysis (equal); validation (equal); writing—review and editing (equal). Olivia Monteiro: Conceptualization (equal); data curation (equal); formal analysis (equal); methodology (equal); project administration (equal); resources (equal); software (equal); supervision (equal); validation (equal); writing—original draft (equal); writing—review and editing (equal). All authors have read and approved the final manuscript.

The authors declare no conflicts of interest.

Not applicable.

Abstract Image

查看原文本刊更多论文

在临床环境中使用ChatGPT:一个病例报告

大型语言模型(llm)正迅速成为一种重要的基础模型，并以多种方式渗透到我们的日常生活中。GPT-3和GPT-4是一个能够进行自然语言处理(NLP)的法学硕士，通过迁移学习对tb级文本数据进行训练，以应用从以前的任务中获得的知识来解决不同但相关的问题，立即引起了医学领域的注意，研究如何使用法学硕士来处理和解释电子健康记录并简化临床写作传统上，NLP模型主要用于医疗保健的诊断辅助。它的使用通常需要在人工标记和训练数据集上进行监督学习，这需要医疗保健专业人员投入大量时间NLP模型通常缺乏精度和准确性，而且大多只有开发人员才能访问。最近的法学硕士与他们的变压器和强化学习与人类的反馈，使更好的精度在文本生成。GPT-3(生成预训练变压器，通常称为ChatGPT)的进步表明，llm可以快速适应新任务，从而获得更好的泛化。此外，ChatGPT具有简单的界面，这使得广泛采用和使用成为可能。拥有这样一个多功能和用户友好的工具在我们的指尖意味着我们可以适应使用法学硕士的基本任务，如生成临床报告，提供临床支持，或从多个来源合成患者数据。我们用这个病例报告作为一个机会来展示ChatGPT在临床环境中基本写作任务中的实用性。本病例报告来源于德克萨斯理工大学TTMedcastTraining在YouTube上上传的两个教学视频。这两个视频是一个叫Jonathan的病人的，他表现出双侧膝盖疼痛，有镰状细胞病的病史。一段视频是一位实习医生对病人的床边介绍，另一段视频是对病人治疗方案的小组讨论。由于GPT-3只能处理文本输入，所以我们下载了每个视频的transcript。抄本有时会包含同时说话的人、填充词、发音错误的词或不完整的句子。未修改的文本分别提交给ChatGPT进行翻译。图1总结了使用ChatGPT生成案例报告的工作流程。我们将视频1的抄本输入到ChatGPT中，并要求它据此撰写一篇案例报告(case report 1)。然后，我们使用视频2的抄本创建case report 2。ChatGPT被要求将两份报告结合起来，而不进行总结，并提供诊断和治疗计划。我们还要求ChatGPT以新英格兰医学杂志的风格撰写最终病例报告。这个过程花了大约1.5个小时，包括作者观看视频的时间。完整的案例报告可在辅助信息，ChatGPT的案例报告中找到。第一作者是一名儿科外科主治医生，他也被要求研究这些视频，并根据这两个视频撰写病例报告。由于工作量大，工作时间多变，他尝试了几次，总共花了大约4个小时才完成。该报告可在辅助信息，医生病例报告中找到。他也被问及是否同意ChatGPT提供的诊断和治疗方案。在仔细研究了镰状细胞病及其表现后，第一作者同意镰状细胞病的诊断和治疗方案。为了比较写作质量，我们要求三位医生根据修改版本的乔安娜布里格斯研究所病例报告关键评估清单对两份病例报告进行评分。3三位医生对ChatGPT和医生报告分别给出了5.7/8和6/8的相似分数。关于ChatGPT的报告，最常见的评论是缺乏患者的既往病史，而这些信息在医生的报告中更为明显。详细评论见支持信息。这项研究是一个很好的例子，说明了当临床医生记录他们的病例以供以后撰写时，ChatGPT如何有效地用于在临床环境中执行简单的写作任务，以及使用ChatGPT综合来自多个来源的数据并提供医疗支持是多么容易。然而，在ChatGPT可以在日常实践中使用之前，需要提出几个要点。ChatGPT需要非常详细和特定的信息来编写最终的案例报告(详细说明提示和生成的单独案例报告的聊天历史记录可在补充材料中获得)。尽管第一个报告并不完美，但是可以通过请求ChatGPT加入其他信息来改进报告。通过实践，用户将熟悉能够产生所需产出的快速策略。在这个案例报告中，我们的提示是清晰而简单的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

MedComm - Future medicine

CiteScore

1.00

自引率

0.00%

发文量