Evaluating ChatGPT-4's performance on oral and maxillofacial queries: Chain of Thought and standard method.

IF 3 Q1 DENTISTRY, ORAL SURGERY & MEDICINE

Frontiers in oral health Pub Date : 2025-02-12 eCollection Date: 2025-01-01 DOI:10.3389/froh.2025.1541976

Kaiyuan Ji, Zhihan Wu, Jing Han, Guangtao Zhai, Jiannan Liu

{"title":"Evaluating ChatGPT-4's performance on oral and maxillofacial queries: Chain of Thought and standard method.","authors":"Kaiyuan Ji, Zhihan Wu, Jing Han, Guangtao Zhai, Jiannan Liu","doi":"10.3389/froh.2025.1541976","DOIUrl":null,"url":null,"abstract":"Objectives: Oral and maxillofacial diseases affect approximately 3.5 billion people worldwide. With the continuous advancement of Artificial Intelligence technologies, particularly the application of generative pre-trained transformers like ChatGPT-4, there is potential to enhance public awareness of the prevention and early detection of these diseases. This study evaluated the performance of ChatGPT-4 in addressing oral and maxillofacial disease questions using standard approaches and the Chain of Thought (CoT) method, aiming to gain a deeper understanding of its capabilities, potential, and limitations.Materials and methods: Three experts, drawing from their extensive experience and the most common questions in clinical settings, selected 130 open-ended questions and 1,805 multiple-choice questions from the national dental licensing examination. These questions encompass 12 areas of oral and maxillofacial surgery, including Prosthodontics, Pediatric Dentistry, Maxillofacial Tumors and Salivary Gland Diseases, and maxillofacial Infections.Results: Using CoT approach, ChatGPT-4 exhibited marked enhancements in accuracy, structure, completeness, professionalism, and overall impression for open-ended questions, revealing statistically significant differences compared to its performance on general oral and maxillofacial inquiries. In the realm of multiple-choice questions, the application of CoT method boosted ChatGPT-4's accuracy across all major subjects, achieving an overall accuracy increase of 3.1%.Conclusions: When employing ChatGPT-4 to address questions in oral and maxillofacial surgery, incorporating CoT as a querying method can enhance its performance and help the public improve their understanding and awareness of such issues. However, it is not advisable to consider it a substitute for doctors.","PeriodicalId":94016,"journal":{"name":"Frontiers in oral health","volume":"6 ","pages":"1541976"},"PeriodicalIF":3.0000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11860867/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in oral health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/froh.2025.1541976","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: Oral and maxillofacial diseases affect approximately 3.5 billion people worldwide. With the continuous advancement of Artificial Intelligence technologies, particularly the application of generative pre-trained transformers like ChatGPT-4, there is potential to enhance public awareness of the prevention and early detection of these diseases. This study evaluated the performance of ChatGPT-4 in addressing oral and maxillofacial disease questions using standard approaches and the Chain of Thought (CoT) method, aiming to gain a deeper understanding of its capabilities, potential, and limitations.

Materials and methods: Three experts, drawing from their extensive experience and the most common questions in clinical settings, selected 130 open-ended questions and 1,805 multiple-choice questions from the national dental licensing examination. These questions encompass 12 areas of oral and maxillofacial surgery, including Prosthodontics, Pediatric Dentistry, Maxillofacial Tumors and Salivary Gland Diseases, and maxillofacial Infections.

Results: Using CoT approach, ChatGPT-4 exhibited marked enhancements in accuracy, structure, completeness, professionalism, and overall impression for open-ended questions, revealing statistically significant differences compared to its performance on general oral and maxillofacial inquiries. In the realm of multiple-choice questions, the application of CoT method boosted ChatGPT-4's accuracy across all major subjects, achieving an overall accuracy increase of 3.1%.

Conclusions: When employing ChatGPT-4 to address questions in oral and maxillofacial surgery, incorporating CoT as a querying method can enhance its performance and help the public improve their understanding and awareness of such issues. However, it is not advisable to consider it a substitute for doctors.

查看原文本刊更多论文

评估ChatGPT-4在口腔颌面查询中的性能：思维链和标准方法。

目的：口腔颌面疾病影响全球约35亿人。随着人工智能技术的不断进步，特别是像ChatGPT-4这样的生成式预训练变压器的应用，有可能提高公众对预防和早期发现这些疾病的意识。本研究使用标准方法和思维链（CoT）方法评估ChatGPT-4在解决口腔颌面疾病问题中的表现，旨在更深入地了解其能力、潜力和局限性。材料和方法：三位专家根据其丰富的经验和临床环境中最常见的问题，从国家牙科执照考试中选择了130个开放式问题和1805个选择题。这些问题涉及口腔颌面外科的12个领域，包括口腔修复学、儿科牙科、颌面肿瘤和唾液腺疾病以及颌面感染。结果：使用CoT方法，ChatGPT-4在开放性问题的准确性、结构、完整性、专业性和总体印象方面都有显著提高，与一般口腔颌面询问相比，显示出统计学上的显著差异。在选择题领域，CoT方法的应用提高了ChatGPT-4在所有主要科目上的准确率，总体准确率提高了3.1%。结论：在使用ChatGPT-4解决口腔颌面外科问题时，将CoT作为一种查询方法可以提高ChatGPT-4的性能，帮助公众提高对此类问题的理解和认识。然而，把它当作医生的替代品是不可取的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊