Leveraging Large Language Models in the delivery of post-operative dental care: a comparison between an embedded GPT model and ChatGPT.

IF 2.5 Q2 DENTISTRY, ORAL SURGERY & MEDICINE

BDJ Open Pub Date : 2024-06-12 DOI:10.1038/s41405-024-00226-3

Itrat Batool, Nighat Naved, Syed Murtaza Raza Kazmi, Fahad Umer

{"title":"Leveraging Large Language Models in the delivery of post-operative dental care: a comparison between an embedded GPT model and ChatGPT.","authors":"Itrat Batool, Nighat Naved, Syed Murtaza Raza Kazmi, Fahad Umer","doi":"10.1038/s41405-024-00226-3","DOIUrl":null,"url":null,"abstract":"Objective: This study underscores the transformative role of Artificial Intelligence (AI) in healthcare, particularly the promising applications of Large Language Models (LLMs) in the delivery of post-operative dental care. The aim is to evaluate the performance of an embedded GPT model and its comparison with ChatGPT-3.5 turbo. The assessment focuses on aspects like response accuracy, clarity, relevance, and up-to-date knowledge in addressing patient concerns and facilitating informed decision-making.Material and methods: An embedded GPT model, employing GPT-3.5-16k, was crafted via GPT-trainer to answer postoperative questions in four dental specialties including Operative Dentistry & Endodontics, Periodontics, Oral & Maxillofacial Surgery, and Prosthodontics. The generated responses were validated by thirty-six dental experts, nine from each specialty, employing a Likert scale, providing comprehensive insights into the embedded GPT model's performance and its comparison with GPT3.5 turbo. For content validation, a quantitative Content Validity Index (CVI) was used. The CVI was calculated both at the item level (I-CVI) and scale level (S-CVI/Ave). To adjust I-CVI for chance agreement, a modified kappa statistic (K*) was computed.Results: The overall content validity of responses generated via embedded GPT model and ChatGPT was 65.62% and 61.87% respectively. Moreover, the embedded GPT model revealed a superior performance surpassing ChatGPT with an accuracy of 62.5% and clarity of 72.5%. In contrast, the responses generated via ChatGPT achieved slightly lower scores, with an accuracy of 52.5% and clarity of 67.5%. However, both models performed equally well in terms of relevance and up-to-date knowledge.Conclusion: In conclusion, embedded GPT model showed better results as compared to ChatGPT in providing post-operative dental care emphasizing the benefits of embedding and prompt engineering, paving the way for future advancements in healthcare applications.","PeriodicalId":36997,"journal":{"name":"BDJ Open","volume":"10 1","pages":"48"},"PeriodicalIF":2.5000,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11169374/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BDJ Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s41405-024-00226-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: This study underscores the transformative role of Artificial Intelligence (AI) in healthcare, particularly the promising applications of Large Language Models (LLMs) in the delivery of post-operative dental care. The aim is to evaluate the performance of an embedded GPT model and its comparison with ChatGPT-3.5 turbo. The assessment focuses on aspects like response accuracy, clarity, relevance, and up-to-date knowledge in addressing patient concerns and facilitating informed decision-making.

Material and methods: An embedded GPT model, employing GPT-3.5-16k, was crafted via GPT-trainer to answer postoperative questions in four dental specialties including Operative Dentistry & Endodontics, Periodontics, Oral & Maxillofacial Surgery, and Prosthodontics. The generated responses were validated by thirty-six dental experts, nine from each specialty, employing a Likert scale, providing comprehensive insights into the embedded GPT model's performance and its comparison with GPT3.5 turbo. For content validation, a quantitative Content Validity Index (CVI) was used. The CVI was calculated both at the item level (I-CVI) and scale level (S-CVI/Ave). To adjust I-CVI for chance agreement, a modified kappa statistic (K*) was computed.

Results: The overall content validity of responses generated via embedded GPT model and ChatGPT was 65.62% and 61.87% respectively. Moreover, the embedded GPT model revealed a superior performance surpassing ChatGPT with an accuracy of 62.5% and clarity of 72.5%. In contrast, the responses generated via ChatGPT achieved slightly lower scores, with an accuracy of 52.5% and clarity of 67.5%. However, both models performed equally well in terms of relevance and up-to-date knowledge.

Conclusion: In conclusion, embedded GPT model showed better results as compared to ChatGPT in providing post-operative dental care emphasizing the benefits of embedding and prompt engineering, paving the way for future advancements in healthcare applications.

Abstract Image

查看原文本刊更多论文

利用大型语言模型提供牙科术后护理：嵌入式 GPT 模型与 ChatGPT 的比较。

研究目的本研究强调了人工智能（AI）在医疗保健领域的变革作用，特别是大语言模型（LLM）在牙科术后护理中的应用前景。目的是评估嵌入式 GPT 模型的性能及其与 ChatGPT-3.5 turbo 的比较。评估的重点是回答的准确性、清晰度、相关性以及在解决患者问题和促进知情决策方面的最新知识：通过 GPT-trainer 制作了一个采用 GPT-3.5-16k 的嵌入式 GPT 模型，用于回答四个牙科专业的术后问题，包括牙科手术和牙髓病学、牙周病学、口腔颌面外科和口腔修复学。所生成的回答由 36 位牙科专家（每个专业 9 位）采用李克特量表进行验证，为嵌入式 GPT 模型的性能及其与 GPT3.5 turbo 的比较提供了全面的见解。在内容验证方面，采用了定量的内容效度指数（CVI）。CVI 同时在项目层面（I-CVI）和量表层面（S-CVI/Ave）进行计算。为了调整 I-CVI 的偶然一致性，计算了修正卡帕统计量（K*）：通过嵌入式 GPT 模型和 ChatGPT 生成的回答的总体内容效度分别为 65.62% 和 61.87%。此外，嵌入式 GPT 模型的准确率为 62.5%，清晰度为 72.5%，表现优于 ChatGPT。相比之下，通过 ChatGPT 生成的回复得分略低，准确率为 52.5%，清晰度为 67.5%。不过，这两种模型在相关性和最新知识方面的表现同样出色：总之，与 ChatGPT 相比，嵌入式 GPT 模型在提供牙科术后护理方面显示出更好的效果，强调了嵌入式和及时工程的好处，为未来医疗保健应用的进步铺平了道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊