Robert Siepmann, Carolin Victoria Schneider, Marc Sebastian von der Stueck, Iakovos Amygdalos, Karsten Große, Kai Markus Schneider, Maike Rebecca Pollmanns, Mohamad Murad, Joel Joy, Elena Kabak, Marcella Ricardis May, Jan Clusmann, Christiane Kuhl, Sven Nebelung, Jakob Nikolas Kather, Daniel Truhn
{"title":"The Impact of Access to Clinical Guidelines on LLM-Based Treatment Recommendations for Chronic Hepatitis B","authors":"Robert Siepmann, Carolin Victoria Schneider, Marc Sebastian von der Stueck, Iakovos Amygdalos, Karsten Große, Kai Markus Schneider, Maike Rebecca Pollmanns, Mohamad Murad, Joel Joy, Elena Kabak, Marcella Ricardis May, Jan Clusmann, Christiane Kuhl, Sven Nebelung, Jakob Nikolas Kather, Daniel Truhn","doi":"10.1111/liv.70324","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background and Aims</h3>\n \n <p>Large language models (LLMs) can potentially support clinicians in their daily routine by providing easy access to information. Yet, they are plagued by stating incorrect facts and hallucinating when queried. Increasing the context by providing external databases while prompting LLMs may decrease the risk of misinformation. This study compares the influence of increased context on the coherence of LLM-based treatment recommendations with the recently updated WHO guidelines for the treatment of chronic hepatitis B (CHB).</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>GPT-4 was queried with five clinical case vignettes in two configurations: with and without additional context. The clinical vignettes were explicitly constructed so that treatment recommendations differed between the formerly applicable 2015 WHO guidelines and the updated 2024 ones. GPT-4 with context was provided access to the updated guidelines, while GPT-4 without context had to rely on its internal knowledge. GPT-4 was accessed only a few days after the release of the new WHO guidelines. Treatment recommendations were compared regarding guideline coherence, information inclusion, textual errors, wording clarity and preciseness by seven physicians.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Using GPT-4 with context increased the coherence of the treatment recommendations with the new 2024 guidelines from 51% to 91% compared to GPT-4 without context. Similar trends were observed for all other categories, leading to an increase of 54% in preciseness and clarity, 24% in completeness of incorporating the case vignette information, and 12% in textual correctness.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>If LLMs are consulted by clinicians for medical advice, they should be given access to external data sources to increase the chance of providing factually correct advice.</p>\n </section>\n </div>","PeriodicalId":18101,"journal":{"name":"Liver International","volume":"45 10","pages":""},"PeriodicalIF":5.2000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/liv.70324","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Liver International","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/liv.70324","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background and Aims
Large language models (LLMs) can potentially support clinicians in their daily routine by providing easy access to information. Yet, they are plagued by stating incorrect facts and hallucinating when queried. Increasing the context by providing external databases while prompting LLMs may decrease the risk of misinformation. This study compares the influence of increased context on the coherence of LLM-based treatment recommendations with the recently updated WHO guidelines for the treatment of chronic hepatitis B (CHB).
Methods
GPT-4 was queried with five clinical case vignettes in two configurations: with and without additional context. The clinical vignettes were explicitly constructed so that treatment recommendations differed between the formerly applicable 2015 WHO guidelines and the updated 2024 ones. GPT-4 with context was provided access to the updated guidelines, while GPT-4 without context had to rely on its internal knowledge. GPT-4 was accessed only a few days after the release of the new WHO guidelines. Treatment recommendations were compared regarding guideline coherence, information inclusion, textual errors, wording clarity and preciseness by seven physicians.
Results
Using GPT-4 with context increased the coherence of the treatment recommendations with the new 2024 guidelines from 51% to 91% compared to GPT-4 without context. Similar trends were observed for all other categories, leading to an increase of 54% in preciseness and clarity, 24% in completeness of incorporating the case vignette information, and 12% in textual correctness.
Conclusions
If LLMs are consulted by clinicians for medical advice, they should be given access to external data sources to increase the chance of providing factually correct advice.
期刊介绍:
Liver International promotes all aspects of the science of hepatology from basic research to applied clinical studies. Providing an international forum for the publication of high-quality original research in hepatology, it is an essential resource for everyone working on normal and abnormal structure and function in the liver and its constituent cells, including clinicians and basic scientists involved in the multi-disciplinary field of hepatology. The journal welcomes articles from all fields of hepatology, which may be published as original articles, brief definitive reports, reviews, mini-reviews, images in hepatology and letters to the Editor.