利用大型语言模型提供牙科术后护理:嵌入式 GPT 模型与 ChatGPT 的比较。

IF 2.5 Q2 DENTISTRY, ORAL SURGERY & MEDICINE
Itrat Batool, Nighat Naved, Syed Murtaza Raza Kazmi, Fahad Umer
{"title":"利用大型语言模型提供牙科术后护理:嵌入式 GPT 模型与 ChatGPT 的比较。","authors":"Itrat Batool, Nighat Naved, Syed Murtaza Raza Kazmi, Fahad Umer","doi":"10.1038/s41405-024-00226-3","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>This study underscores the transformative role of Artificial Intelligence (AI) in healthcare, particularly the promising applications of Large Language Models (LLMs) in the delivery of post-operative dental care. The aim is to evaluate the performance of an embedded GPT model and its comparison with ChatGPT-3.5 turbo. The assessment focuses on aspects like response accuracy, clarity, relevance, and up-to-date knowledge in addressing patient concerns and facilitating informed decision-making.</p><p><strong>Material and methods: </strong>An embedded GPT model, employing GPT-3.5-16k, was crafted via GPT-trainer to answer postoperative questions in four dental specialties including Operative Dentistry & Endodontics, Periodontics, Oral & Maxillofacial Surgery, and Prosthodontics. The generated responses were validated by thirty-six dental experts, nine from each specialty, employing a Likert scale, providing comprehensive insights into the embedded GPT model's performance and its comparison with GPT3.5 turbo. For content validation, a quantitative Content Validity Index (CVI) was used. The CVI was calculated both at the item level (I-CVI) and scale level (S-CVI/Ave). To adjust I-CVI for chance agreement, a modified kappa statistic (K*) was computed.</p><p><strong>Results: </strong>The overall content validity of responses generated via embedded GPT model and ChatGPT was 65.62% and 61.87% respectively. Moreover, the embedded GPT model revealed a superior performance surpassing ChatGPT with an accuracy of 62.5% and clarity of 72.5%. In contrast, the responses generated via ChatGPT achieved slightly lower scores, with an accuracy of 52.5% and clarity of 67.5%. However, both models performed equally well in terms of relevance and up-to-date knowledge.</p><p><strong>Conclusion: </strong>In conclusion, embedded GPT model showed better results as compared to ChatGPT in providing post-operative dental care emphasizing the benefits of embedding and prompt engineering, paving the way for future advancements in healthcare applications.</p>","PeriodicalId":36997,"journal":{"name":"BDJ Open","volume":null,"pages":null},"PeriodicalIF":2.5000,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11169374/pdf/","citationCount":"0","resultStr":"{\"title\":\"Leveraging Large Language Models in the delivery of post-operative dental care: a comparison between an embedded GPT model and ChatGPT.\",\"authors\":\"Itrat Batool, Nighat Naved, Syed Murtaza Raza Kazmi, Fahad Umer\",\"doi\":\"10.1038/s41405-024-00226-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>This study underscores the transformative role of Artificial Intelligence (AI) in healthcare, particularly the promising applications of Large Language Models (LLMs) in the delivery of post-operative dental care. The aim is to evaluate the performance of an embedded GPT model and its comparison with ChatGPT-3.5 turbo. The assessment focuses on aspects like response accuracy, clarity, relevance, and up-to-date knowledge in addressing patient concerns and facilitating informed decision-making.</p><p><strong>Material and methods: </strong>An embedded GPT model, employing GPT-3.5-16k, was crafted via GPT-trainer to answer postoperative questions in four dental specialties including Operative Dentistry & Endodontics, Periodontics, Oral & Maxillofacial Surgery, and Prosthodontics. The generated responses were validated by thirty-six dental experts, nine from each specialty, employing a Likert scale, providing comprehensive insights into the embedded GPT model's performance and its comparison with GPT3.5 turbo. For content validation, a quantitative Content Validity Index (CVI) was used. The CVI was calculated both at the item level (I-CVI) and scale level (S-CVI/Ave). To adjust I-CVI for chance agreement, a modified kappa statistic (K*) was computed.</p><p><strong>Results: </strong>The overall content validity of responses generated via embedded GPT model and ChatGPT was 65.62% and 61.87% respectively. Moreover, the embedded GPT model revealed a superior performance surpassing ChatGPT with an accuracy of 62.5% and clarity of 72.5%. In contrast, the responses generated via ChatGPT achieved slightly lower scores, with an accuracy of 52.5% and clarity of 67.5%. However, both models performed equally well in terms of relevance and up-to-date knowledge.</p><p><strong>Conclusion: </strong>In conclusion, embedded GPT model showed better results as compared to ChatGPT in providing post-operative dental care emphasizing the benefits of embedding and prompt engineering, paving the way for future advancements in healthcare applications.</p>\",\"PeriodicalId\":36997,\"journal\":{\"name\":\"BDJ Open\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2024-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11169374/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BDJ Open\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1038/s41405-024-00226-3\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BDJ Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s41405-024-00226-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0

摘要

研究目的本研究强调了人工智能(AI)在医疗保健领域的变革作用,特别是大语言模型(LLM)在牙科术后护理中的应用前景。目的是评估嵌入式 GPT 模型的性能及其与 ChatGPT-3.5 turbo 的比较。评估的重点是回答的准确性、清晰度、相关性以及在解决患者问题和促进知情决策方面的最新知识:通过 GPT-trainer 制作了一个采用 GPT-3.5-16k 的嵌入式 GPT 模型,用于回答四个牙科专业的术后问题,包括牙科手术和牙髓病学、牙周病学、口腔颌面外科和口腔修复学。所生成的回答由 36 位牙科专家(每个专业 9 位)采用李克特量表进行验证,为嵌入式 GPT 模型的性能及其与 GPT3.5 turbo 的比较提供了全面的见解。在内容验证方面,采用了定量的内容效度指数(CVI)。CVI 同时在项目层面(I-CVI)和量表层面(S-CVI/Ave)进行计算。为了调整 I-CVI 的偶然一致性,计算了修正卡帕统计量(K*):通过嵌入式 GPT 模型和 ChatGPT 生成的回答的总体内容效度分别为 65.62% 和 61.87%。此外,嵌入式 GPT 模型的准确率为 62.5%,清晰度为 72.5%,表现优于 ChatGPT。相比之下,通过 ChatGPT 生成的回复得分略低,准确率为 52.5%,清晰度为 67.5%。不过,这两种模型在相关性和最新知识方面的表现同样出色:总之,与 ChatGPT 相比,嵌入式 GPT 模型在提供牙科术后护理方面显示出更好的效果,强调了嵌入式和及时工程的好处,为未来医疗保健应用的进步铺平了道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Leveraging Large Language Models in the delivery of post-operative dental care: a comparison between an embedded GPT model and ChatGPT.

Leveraging Large Language Models in the delivery of post-operative dental care: a comparison between an embedded GPT model and ChatGPT.

Objective: This study underscores the transformative role of Artificial Intelligence (AI) in healthcare, particularly the promising applications of Large Language Models (LLMs) in the delivery of post-operative dental care. The aim is to evaluate the performance of an embedded GPT model and its comparison with ChatGPT-3.5 turbo. The assessment focuses on aspects like response accuracy, clarity, relevance, and up-to-date knowledge in addressing patient concerns and facilitating informed decision-making.

Material and methods: An embedded GPT model, employing GPT-3.5-16k, was crafted via GPT-trainer to answer postoperative questions in four dental specialties including Operative Dentistry & Endodontics, Periodontics, Oral & Maxillofacial Surgery, and Prosthodontics. The generated responses were validated by thirty-six dental experts, nine from each specialty, employing a Likert scale, providing comprehensive insights into the embedded GPT model's performance and its comparison with GPT3.5 turbo. For content validation, a quantitative Content Validity Index (CVI) was used. The CVI was calculated both at the item level (I-CVI) and scale level (S-CVI/Ave). To adjust I-CVI for chance agreement, a modified kappa statistic (K*) was computed.

Results: The overall content validity of responses generated via embedded GPT model and ChatGPT was 65.62% and 61.87% respectively. Moreover, the embedded GPT model revealed a superior performance surpassing ChatGPT with an accuracy of 62.5% and clarity of 72.5%. In contrast, the responses generated via ChatGPT achieved slightly lower scores, with an accuracy of 52.5% and clarity of 67.5%. However, both models performed equally well in terms of relevance and up-to-date knowledge.

Conclusion: In conclusion, embedded GPT model showed better results as compared to ChatGPT in providing post-operative dental care emphasizing the benefits of embedding and prompt engineering, paving the way for future advancements in healthcare applications.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
BDJ Open
BDJ Open Dentistry-Dentistry (all)
CiteScore
3.70
自引率
3.30%
发文量
34
审稿时长
30 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信