ScholarGPT's performance in oral and maxillofacial surgery

IF 1.8 3区医学 Q2 DENTISTRY, ORAL SURGERY & MEDICINE

Journal of Stomatology Oral and Maxillofacial Surgery Pub Date : 2024-10-09 DOI:10.1016/j.jormas.2024.102114

Yunus Balel

{"title":"ScholarGPT's performance in oral and maxillofacial surgery","authors":"Yunus Balel","doi":"10.1016/j.jormas.2024.102114","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>The purpose of this study is to evaluate the performance of Scholar GPT in answering technical questions in the field of oral and maxillofacial surgery and to conduct a comparative analysis with the results of a previous study that assessed the performance of ChatGPT.</div></div><div><h3>Materials and Methods</h3><div>Scholar GPT was accessed via ChatGPT (<span><span>www.chatgpt.com</span><svg><path></path></svg></span>) on March 20, 2024. A total of 60 technical questions (15 each on impacted teeth, dental implants, temporomandibular joint disorders, and orthognathic surgery) from our previous study were used. Scholar GPT's responses were evaluated using a modified Global Quality Scale (GQS). The questions were randomized before scoring using an online randomizer (<span><span>www.randomizer.org</span><svg><path></path></svg></span>). A single researcher performed the evaluations at three different times, three weeks apart, with each evaluation preceded by a new randomization. In cases of score discrepancies, a fourth evaluation was conducted to determine the final score.</div></div><div><h3>Results</h3><div>Scholar GPT performed well across all technical questions, with an average GQS score of 4.48 (SD=0.93). Comparatively, ChatGPT's average GQS score in previous study was 3.1 (SD=1.492). The Wilcoxon Signed-Rank Test indicated a statistically significant higher average score for Scholar GPT compared to ChatGPT (Mean Difference = 2.00, SE = 0.163, <em>p</em> < 0.001). The Kruskal-Wallis Test showed no statistically significant differences among the topic groups (χ² = 0.799, df = 3, <em>p</em> = 0.850, ε² = 0.0135).</div></div><div><h3>Conclusion</h3><div>Scholar GPT demonstrated a generally high performance in technical questions within oral and maxillofacial surgery and produced more consistent and higher-quality responses compared to ChatGPT. The findings suggest that GPT models based on academic databases can provide more accurate and reliable information. Additionally, developing a specialized GPT model for oral and maxillofacial surgery could ensure higher quality and consistency in artificial intelligence-generated information.</div></div>","PeriodicalId":55993,"journal":{"name":"Journal of Stomatology Oral and Maxillofacial Surgery","volume":"126 4","pages":"Article 102114"},"PeriodicalIF":1.8000,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Stomatology Oral and Maxillofacial Surgery","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468785524004038","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

Abstract

Objective

The purpose of this study is to evaluate the performance of Scholar GPT in answering technical questions in the field of oral and maxillofacial surgery and to conduct a comparative analysis with the results of a previous study that assessed the performance of ChatGPT.

Materials and Methods

Scholar GPT was accessed via ChatGPT (www.chatgpt.com) on March 20, 2024. A total of 60 technical questions (15 each on impacted teeth, dental implants, temporomandibular joint disorders, and orthognathic surgery) from our previous study were used. Scholar GPT's responses were evaluated using a modified Global Quality Scale (GQS). The questions were randomized before scoring using an online randomizer (www.randomizer.org). A single researcher performed the evaluations at three different times, three weeks apart, with each evaluation preceded by a new randomization. In cases of score discrepancies, a fourth evaluation was conducted to determine the final score.

Results

Scholar GPT performed well across all technical questions, with an average GQS score of 4.48 (SD=0.93). Comparatively, ChatGPT's average GQS score in previous study was 3.1 (SD=1.492). The Wilcoxon Signed-Rank Test indicated a statistically significant higher average score for Scholar GPT compared to ChatGPT (Mean Difference = 2.00, SE = 0.163, p < 0.001). The Kruskal-Wallis Test showed no statistically significant differences among the topic groups (χ² = 0.799, df = 3, p = 0.850, ε² = 0.0135).

Conclusion

Scholar GPT demonstrated a generally high performance in technical questions within oral and maxillofacial surgery and produced more consistent and higher-quality responses compared to ChatGPT. The findings suggest that GPT models based on academic databases can provide more accurate and reliable information. Additionally, developing a specialized GPT model for oral and maxillofacial surgery could ensure higher quality and consistency in artificial intelligence-generated information.

查看原文本刊更多论文

ScholarGPT 在口腔颌面外科方面的表现。

研究目的本研究的目的是评估学者 GPT 在回答口腔颌面外科领域技术问题方面的性能，并与之前评估 ChatGPT 性能的研究结果进行对比分析：学者 GPT 于 2024 年 3 月 20 日通过 ChatGPT (www.chatgpt.com) 访问。共使用了我们之前研究中的 60 个技术问题（关于阻生牙、种植牙、颞下颌关节紊乱和正颌外科手术的问题各 15 个）。学者 GPT 的回答采用修改后的全球质量量表 (GQS) 进行评估。在评分前，使用在线随机器（www.randomizer.org）对问题进行了随机化。由一名研究人员在三个不同的时间进行评估，时间间隔为三周，每次评估前都会进行新的随机化。如果出现分数差异，则进行第四次评估以确定最终分数：学者 GPT 在所有技术问题上都表现良好，平均 GQS 得分为 4.48（SD=0.93）。相比之下，ChatGPT 在之前研究中的平均 GQS 得分为 3.1（标准差=1.492）。Wilcoxon Signed-Rank 检验表明，与 ChatGPT 相比，Scholar GPT 的平均得分明显更高（平均差异 = 2.00，SE = 0.163，p < 0.001）。Kruskal-Wallis 检验表明，各主题组之间没有显著的统计学差异（χ² = 0.799，df = 3，p = 0.850，ε² = 0.0135）：与 ChatGPT 相比，学者 GPT 在口腔颌面外科技术问题上表现出普遍较高的性能，并能产生更一致、更高质量的回答。研究结果表明，基于学术数据库的 GPT 模型可以提供更准确、更可靠的信息。此外，为口腔颌面外科开发专门的 GPT 模型可以确保人工智能生成的信息具有更高的质量和一致性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊