Ty J Skyles, Isaac J Freeman, Georgewilliam Kalibbala, David Davila-Garcia, Kendall Kiser, Silpa Raju, Adam Wilcox
{"title":"Exploring ChatGPT 3.5 for structured data extraction from oncological notes.","authors":"Ty J Skyles, Isaac J Freeman, Georgewilliam Kalibbala, David Davila-Garcia, Kendall Kiser, Silpa Raju, Adam Wilcox","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>In large-scale clinical informatics, there is a need to maximize the amount of usable data from electronic health records. With the adoption of large language models in medical research, there is potential to use them to extract structured data from unstructured clinical notes. We explored how ChatGPT could be used to improve data availability in cancer research. We assessed how GPT used clinical notes to answer six relevant clinical questions. Four prompt engineering strategies were used: zero-shot, zero-shot with context, few-shot, and few-shot with context. Few-shot prompting often decreased the accuracy of GPT outputs and context did not consistently improve accuracy. GPT extracted patients' Gleason scores and ages with an F1 score of 0.99 and it identified if patients received palliative care with and if patients were in pain with an F1 score of 0.86. Effective use of LLMs has potential to increase interoperability between healthcare and clinical research.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2025 ","pages":"518-526"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150697/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In large-scale clinical informatics, there is a need to maximize the amount of usable data from electronic health records. With the adoption of large language models in medical research, there is potential to use them to extract structured data from unstructured clinical notes. We explored how ChatGPT could be used to improve data availability in cancer research. We assessed how GPT used clinical notes to answer six relevant clinical questions. Four prompt engineering strategies were used: zero-shot, zero-shot with context, few-shot, and few-shot with context. Few-shot prompting often decreased the accuracy of GPT outputs and context did not consistently improve accuracy. GPT extracted patients' Gleason scores and ages with an F1 score of 0.99 and it identified if patients received palliative care with and if patients were in pain with an F1 score of 0.86. Effective use of LLMs has potential to increase interoperability between healthcare and clinical research.