ChatGPT-4 从电子健康记录中提取心衰症状和体征。

T Elizabeth Workman, Ali Ahmed, Helen M Sheriff, Venkatesh K Raman, Sijian Zhang, Yijun Shao, Charles Faselis, Gregg C Fonarow, Qing Zeng-Treitler
{"title":"ChatGPT-4 从电子健康记录中提取心衰症状和体征。","authors":"T Elizabeth Workman, Ali Ahmed, Helen M Sheriff, Venkatesh K Raman, Sijian Zhang, Yijun Shao, Charles Faselis, Gregg C Fonarow, Qing Zeng-Treitler","doi":"10.1016/j.pcad.2024.10.010","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Natural language processing (NLP) can facilitate research utilizing data from electronic health records (EHRs). Large language models can potentially improve NLP applications leveraging EHR notes. The objective of this study was to assess the performance of zero-shot learning using Chat Generative Pre-trained Transformer 4 (ChatGPT-4) for extraction of symptoms and signs, and compare its performance to baseline machine learning and rule-based methods developed using annotated data.</p><p><strong>Methods and results: </strong>From unstructured clinical notes of the national EHR data of the Veterans healthcare system, we extracted 1999 text snippets containing relevant keywords for heart failure symptoms and signs, which were then annotated by two clinicians. We also created 102 synthetic snippets that were semantically similar to snippets randomly selected from the original 1999 snippets. The authors applied zero-shot learning, using two different forms of prompt engineering in a symptom and sign extraction task with ChatGPT-4, utilizing the synthetic snippets. For comparison, baseline models using machine learning and rule-based methods were trained using the original 1999 annotated text snippets, and then used to classify the 102 synthetic snippets. The best zero-shot learning application achieved 90.6 % precision, 100 % recall, and 95 % F1 score, outperforming the best baseline method, which achieved 54.9 % precision, 82.4 % recall, and 65.5 % F1 score. Prompt style and temperature settings influenced zero-shot learning performance.</p><p><strong>Conclusions: </strong>Zero-shot learning utilizing ChatGPT-4 significantly outperformed traditional machine learning and rule-based NLP. Prompt type and temperature settings affected zero-shot learning performance. These findings suggest a more efficient means of symptoms and signs extraction than traditional machine learning and rule-based methods.</p>","PeriodicalId":94178,"journal":{"name":"Progress in cardiovascular diseases","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ChatGPT-4 extraction of heart failure symptoms and signs from electronic health records.\",\"authors\":\"T Elizabeth Workman, Ali Ahmed, Helen M Sheriff, Venkatesh K Raman, Sijian Zhang, Yijun Shao, Charles Faselis, Gregg C Fonarow, Qing Zeng-Treitler\",\"doi\":\"10.1016/j.pcad.2024.10.010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Natural language processing (NLP) can facilitate research utilizing data from electronic health records (EHRs). Large language models can potentially improve NLP applications leveraging EHR notes. The objective of this study was to assess the performance of zero-shot learning using Chat Generative Pre-trained Transformer 4 (ChatGPT-4) for extraction of symptoms and signs, and compare its performance to baseline machine learning and rule-based methods developed using annotated data.</p><p><strong>Methods and results: </strong>From unstructured clinical notes of the national EHR data of the Veterans healthcare system, we extracted 1999 text snippets containing relevant keywords for heart failure symptoms and signs, which were then annotated by two clinicians. We also created 102 synthetic snippets that were semantically similar to snippets randomly selected from the original 1999 snippets. The authors applied zero-shot learning, using two different forms of prompt engineering in a symptom and sign extraction task with ChatGPT-4, utilizing the synthetic snippets. For comparison, baseline models using machine learning and rule-based methods were trained using the original 1999 annotated text snippets, and then used to classify the 102 synthetic snippets. The best zero-shot learning application achieved 90.6 % precision, 100 % recall, and 95 % F1 score, outperforming the best baseline method, which achieved 54.9 % precision, 82.4 % recall, and 65.5 % F1 score. Prompt style and temperature settings influenced zero-shot learning performance.</p><p><strong>Conclusions: </strong>Zero-shot learning utilizing ChatGPT-4 significantly outperformed traditional machine learning and rule-based NLP. Prompt type and temperature settings affected zero-shot learning performance. These findings suggest a more efficient means of symptoms and signs extraction than traditional machine learning and rule-based methods.</p>\",\"PeriodicalId\":94178,\"journal\":{\"name\":\"Progress in cardiovascular diseases\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Progress in cardiovascular diseases\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1016/j.pcad.2024.10.010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Progress in cardiovascular diseases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.pcad.2024.10.010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景:自然语言处理(NLP)可以促进利用电子健康记录(EHR)数据的研究。大型语言模型有可能改善利用电子健康记录笔记的 NLP 应用。本研究的目的是评估使用 Chat Generative Pre-trained Transformer 4 (ChatGPT-4) 进行零镜头学习提取症状和体征的性能,并将其性能与使用注释数据开发的基线机器学习和基于规则的方法进行比较:我们从退伍军人医疗保健系统的国家电子病历数据的非结构化临床笔记中提取了 1999 个包含心衰症状和体征相关关键词的文本片段,然后由两名临床医生对这些片段进行了注释。我们还创建了 102 个合成片段,这些片段在语义上与从 1999 年原始片段中随机选取的片段相似。作者在 ChatGPT-4 的症状和体征提取任务中使用了两种不同形式的提示工程,并利用合成片段进行了零点学习。为了进行比较,使用机器学习和基于规则的方法对 1999 年原始注释文本片段进行了基线模型训练,然后用于对 102 个合成片段进行分类。最佳零点学习应用的精确度为 90.6%,召回率为 100%,F1 分数为 95%,优于最佳基线方法,后者的精确度为 54.9%,召回率为 82.4%,F1 分数为 65.5%。提示风格和温度设置影响了零点学习的性能:结论:利用 ChatGPT-4 进行的零点学习明显优于传统的机器学习和基于规则的 NLP。提示类型和温度设置影响了零点学习性能。这些研究结果表明,与传统的机器学习和基于规则的方法相比,零点学习是一种更有效的症状和体征提取方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
ChatGPT-4 extraction of heart failure symptoms and signs from electronic health records.

Background: Natural language processing (NLP) can facilitate research utilizing data from electronic health records (EHRs). Large language models can potentially improve NLP applications leveraging EHR notes. The objective of this study was to assess the performance of zero-shot learning using Chat Generative Pre-trained Transformer 4 (ChatGPT-4) for extraction of symptoms and signs, and compare its performance to baseline machine learning and rule-based methods developed using annotated data.

Methods and results: From unstructured clinical notes of the national EHR data of the Veterans healthcare system, we extracted 1999 text snippets containing relevant keywords for heart failure symptoms and signs, which were then annotated by two clinicians. We also created 102 synthetic snippets that were semantically similar to snippets randomly selected from the original 1999 snippets. The authors applied zero-shot learning, using two different forms of prompt engineering in a symptom and sign extraction task with ChatGPT-4, utilizing the synthetic snippets. For comparison, baseline models using machine learning and rule-based methods were trained using the original 1999 annotated text snippets, and then used to classify the 102 synthetic snippets. The best zero-shot learning application achieved 90.6 % precision, 100 % recall, and 95 % F1 score, outperforming the best baseline method, which achieved 54.9 % precision, 82.4 % recall, and 65.5 % F1 score. Prompt style and temperature settings influenced zero-shot learning performance.

Conclusions: Zero-shot learning utilizing ChatGPT-4 significantly outperformed traditional machine learning and rule-based NLP. Prompt type and temperature settings affected zero-shot learning performance. These findings suggest a more efficient means of symptoms and signs extraction than traditional machine learning and rule-based methods.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信