T Elizabeth Workman, Ali Ahmed, Helen M Sheriff, Venkatesh K Raman, Sijian Zhang, Yijun Shao, Charles Faselis, Gregg C Fonarow, Qing Zeng-Treitler
{"title":"ChatGPT-4 extraction of heart failure symptoms and signs from electronic health records.","authors":"T Elizabeth Workman, Ali Ahmed, Helen M Sheriff, Venkatesh K Raman, Sijian Zhang, Yijun Shao, Charles Faselis, Gregg C Fonarow, Qing Zeng-Treitler","doi":"10.1016/j.pcad.2024.10.010","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Natural language processing (NLP) can facilitate research utilizing data from electronic health records (EHRs). Large language models can potentially improve NLP applications leveraging EHR notes. The objective of this study was to assess the performance of zero-shot learning using Chat Generative Pre-trained Transformer 4 (ChatGPT-4) for extraction of symptoms and signs, and compare its performance to baseline machine learning and rule-based methods developed using annotated data.</p><p><strong>Methods and results: </strong>From unstructured clinical notes of the national EHR data of the Veterans healthcare system, we extracted 1999 text snippets containing relevant keywords for heart failure symptoms and signs, which were then annotated by two clinicians. We also created 102 synthetic snippets that were semantically similar to snippets randomly selected from the original 1999 snippets. The authors applied zero-shot learning, using two different forms of prompt engineering in a symptom and sign extraction task with ChatGPT-4, utilizing the synthetic snippets. For comparison, baseline models using machine learning and rule-based methods were trained using the original 1999 annotated text snippets, and then used to classify the 102 synthetic snippets. The best zero-shot learning application achieved 90.6 % precision, 100 % recall, and 95 % F1 score, outperforming the best baseline method, which achieved 54.9 % precision, 82.4 % recall, and 65.5 % F1 score. Prompt style and temperature settings influenced zero-shot learning performance.</p><p><strong>Conclusions: </strong>Zero-shot learning utilizing ChatGPT-4 significantly outperformed traditional machine learning and rule-based NLP. Prompt type and temperature settings affected zero-shot learning performance. These findings suggest a more efficient means of symptoms and signs extraction than traditional machine learning and rule-based methods.</p>","PeriodicalId":94178,"journal":{"name":"Progress in cardiovascular diseases","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Progress in cardiovascular diseases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.pcad.2024.10.010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Natural language processing (NLP) can facilitate research utilizing data from electronic health records (EHRs). Large language models can potentially improve NLP applications leveraging EHR notes. The objective of this study was to assess the performance of zero-shot learning using Chat Generative Pre-trained Transformer 4 (ChatGPT-4) for extraction of symptoms and signs, and compare its performance to baseline machine learning and rule-based methods developed using annotated data.
Methods and results: From unstructured clinical notes of the national EHR data of the Veterans healthcare system, we extracted 1999 text snippets containing relevant keywords for heart failure symptoms and signs, which were then annotated by two clinicians. We also created 102 synthetic snippets that were semantically similar to snippets randomly selected from the original 1999 snippets. The authors applied zero-shot learning, using two different forms of prompt engineering in a symptom and sign extraction task with ChatGPT-4, utilizing the synthetic snippets. For comparison, baseline models using machine learning and rule-based methods were trained using the original 1999 annotated text snippets, and then used to classify the 102 synthetic snippets. The best zero-shot learning application achieved 90.6 % precision, 100 % recall, and 95 % F1 score, outperforming the best baseline method, which achieved 54.9 % precision, 82.4 % recall, and 65.5 % F1 score. Prompt style and temperature settings influenced zero-shot learning performance.
Conclusions: Zero-shot learning utilizing ChatGPT-4 significantly outperformed traditional machine learning and rule-based NLP. Prompt type and temperature settings affected zero-shot learning performance. These findings suggest a more efficient means of symptoms and signs extraction than traditional machine learning and rule-based methods.