Nathan C Hurley, Rajnish K Gupta, Kristopher M Schroeder, Aaron S Hess
{"title":"Danger, Danger, Gaston Labat! Does zero-shot artificial intelligence correlate with anticoagulation guidelines recommendations for neuraxial anesthesia?","authors":"Nathan C Hurley, Rajnish K Gupta, Kristopher M Schroeder, Aaron S Hess","doi":"10.1136/rapm-2023-104868","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Artificial intelligence and large language models (LLMs) have emerged as potentially disruptive technologies in healthcare. In this study GPT-3.5, an accessible LLM, was assessed for its accuracy and reliability in performing guideline-based evaluation of neuraxial bleeding risk in hypothetical patients on anticoagulation medication. The study also explored the impact of structured prompt guidance on the LLM's performance.</p><p><strong>Methods: </strong>A dataset of 10 hypothetical patient stems and 26 anticoagulation profiles (260 unique combinations) was developed based on American Society of Regional Anesthesia and Pain Medicine guidelines. Five prompts were created for the LLM, ranging from minimal guidance to explicit instructions. The model's responses were compared with a \"truth table\" based on the guidelines. Performance metrics, including accuracy and area under the receiver operating curve (AUC), were used.</p><p><strong>Results: </strong>Baseline performance of GPT-3.5 was slightly above chance. With detailed prompts and explicit guidelines, performance improved significantly (AUC 0.70, 95% CI (0.64 to 0.77)). Performance varied among medication classes.</p><p><strong>Discussion: </strong>LLMs show potential for assisting in clinical decision making but rely on accurate and relevant prompts. Integration of LLMs should consider safety and privacy concerns. Further research is needed to optimize LLM performance and address complex scenarios. The tested LLM demonstrates potential in assessing neuraxial bleeding risk but relies on precise prompts. LLM integration should be approached cautiously, considering limitations. Future research should focus on optimization and understanding LLM capabilities and limitations in healthcare.</p>","PeriodicalId":54503,"journal":{"name":"Regional Anesthesia and Pain Medicine","volume":" ","pages":"661-667"},"PeriodicalIF":5.1000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Regional Anesthesia and Pain Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/rapm-2023-104868","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ANESTHESIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Artificial intelligence and large language models (LLMs) have emerged as potentially disruptive technologies in healthcare. In this study GPT-3.5, an accessible LLM, was assessed for its accuracy and reliability in performing guideline-based evaluation of neuraxial bleeding risk in hypothetical patients on anticoagulation medication. The study also explored the impact of structured prompt guidance on the LLM's performance.
Methods: A dataset of 10 hypothetical patient stems and 26 anticoagulation profiles (260 unique combinations) was developed based on American Society of Regional Anesthesia and Pain Medicine guidelines. Five prompts were created for the LLM, ranging from minimal guidance to explicit instructions. The model's responses were compared with a "truth table" based on the guidelines. Performance metrics, including accuracy and area under the receiver operating curve (AUC), were used.
Results: Baseline performance of GPT-3.5 was slightly above chance. With detailed prompts and explicit guidelines, performance improved significantly (AUC 0.70, 95% CI (0.64 to 0.77)). Performance varied among medication classes.
Discussion: LLMs show potential for assisting in clinical decision making but rely on accurate and relevant prompts. Integration of LLMs should consider safety and privacy concerns. Further research is needed to optimize LLM performance and address complex scenarios. The tested LLM demonstrates potential in assessing neuraxial bleeding risk but relies on precise prompts. LLM integration should be approached cautiously, considering limitations. Future research should focus on optimization and understanding LLM capabilities and limitations in healthcare.
期刊介绍:
Regional Anesthesia & Pain Medicine, the official publication of the American Society of Regional Anesthesia and Pain Medicine (ASRA), is a monthly journal that publishes peer-reviewed scientific and clinical studies to advance the understanding and clinical application of regional techniques for surgical anesthesia and postoperative analgesia. Coverage includes intraoperative regional techniques, perioperative pain, chronic pain, obstetric anesthesia, pediatric anesthesia, outcome studies, and complications.
Published for over thirty years, this respected journal also serves as the official publication of the European Society of Regional Anaesthesia and Pain Therapy (ESRA), the Asian and Oceanic Society of Regional Anesthesia (AOSRA), the Latin American Society of Regional Anesthesia (LASRA), the African Society for Regional Anesthesia (AFSRA), and the Academy of Regional Anaesthesia of India (AORA).