Ara Khoylyan BS , Jason Salvato BS , Frank Vazquez BSAT , Mina Girgis MD , Alex Tang MD , Tan Chen MD FRCSC FACS
{"title":"Evaluation of GPT-4 concordance with north American spine society guidelines for lumbar fusion surgery","authors":"Ara Khoylyan BS , Jason Salvato BS , Frank Vazquez BSAT , Mina Girgis MD , Alex Tang MD , Tan Chen MD FRCSC FACS","doi":"10.1016/j.xnsj.2024.100580","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Concordance with evidence-based medicine (EBM) guidelines is associated with improved clinical outcomes in spine surgery. The North American Spine Society (NASS) has published coverage guidelines on indications for lumbar fusion surgery, with a recent survey demonstrating a 60% concordance rate across its members. GPT-4 is a popular deep learning model that receives knowledge training across public databases including those containing EBM guidelines. There is prior research exploring the potential utility of artificial intelligence (AI) software in adherence with spine surgery practices and guidelines, inviting opportunity to further investigate application in the setting of lumbar fusion surgery with current AI models.</div></div><div><h3>Methods</h3><div>Seventeen well-validated clinical vignettes with specific indications for or against lumbar fusion based on NASS criteria were obtained from a prior published research study. Each case was transcribed into a standardized prompt and entered into GPT-4 to obtain a decision whether fusion is indicated. Interquery reliability was assessed with serial identical queries utilizing the Fleiss’ Kappa statistic. Majority response among serial queries was considered as the final GPT-4 decision. Queries were all entered in separate strings. The investigator entering the prompts was blinded to the NASS-concordant decisions for the cases prior to complete data collection. Decisions by GPT-4 and NASS guidelines were compared with Chi-square analysis.</div></div><div><h3>Results</h3><div>GPT-4 responses for 15/17 (88.2%) of the clinical vignettes were in concordance with NASS EBM lumbar fusion guidelines. There was a significant association in clinical decision-making when determining indication for spine fusion surgery between GPT-4 and NASS guidelines (χ² = 9.75; p<.01). There was substantial agreement among the sets of responses generated by GPT-4 for each clinical case (K = 0.71; p<.001).</div></div><div><h3>Conclusions</h3><div>There is significant concordance between GPT-4 responses and NASS EBM indications for lumbar fusion surgery. AI and deep learning models may prove to be an effective adjunct tool for clinical decision-making within modern spine surgery practices.</div></div>","PeriodicalId":34622,"journal":{"name":"North American Spine Society Journal","volume":"21 ","pages":"Article 100580"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"North American Spine Society Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666548424002737","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Concordance with evidence-based medicine (EBM) guidelines is associated with improved clinical outcomes in spine surgery. The North American Spine Society (NASS) has published coverage guidelines on indications for lumbar fusion surgery, with a recent survey demonstrating a 60% concordance rate across its members. GPT-4 is a popular deep learning model that receives knowledge training across public databases including those containing EBM guidelines. There is prior research exploring the potential utility of artificial intelligence (AI) software in adherence with spine surgery practices and guidelines, inviting opportunity to further investigate application in the setting of lumbar fusion surgery with current AI models.
Methods
Seventeen well-validated clinical vignettes with specific indications for or against lumbar fusion based on NASS criteria were obtained from a prior published research study. Each case was transcribed into a standardized prompt and entered into GPT-4 to obtain a decision whether fusion is indicated. Interquery reliability was assessed with serial identical queries utilizing the Fleiss’ Kappa statistic. Majority response among serial queries was considered as the final GPT-4 decision. Queries were all entered in separate strings. The investigator entering the prompts was blinded to the NASS-concordant decisions for the cases prior to complete data collection. Decisions by GPT-4 and NASS guidelines were compared with Chi-square analysis.
Results
GPT-4 responses for 15/17 (88.2%) of the clinical vignettes were in concordance with NASS EBM lumbar fusion guidelines. There was a significant association in clinical decision-making when determining indication for spine fusion surgery between GPT-4 and NASS guidelines (χ² = 9.75; p<.01). There was substantial agreement among the sets of responses generated by GPT-4 for each clinical case (K = 0.71; p<.001).
Conclusions
There is significant concordance between GPT-4 responses and NASS EBM indications for lumbar fusion surgery. AI and deep learning models may prove to be an effective adjunct tool for clinical decision-making within modern spine surgery practices.