{"title":"ChatGPT-3.5与GPT-4在牙科专科开放式临床推理中的比较分析。","authors":"Yasamin Babaee Hemmati, Morteza Rasouli, Mehran Falahchai","doi":"10.1111/eje.13144","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>The integration of large language models (LLMs) such as ChatGPT into health care has garnered increasing interest. While previous studies have assessed these models using structured multiple-choice questions, limited research has evaluated their performance on open-ended, scenario-based clinical tasks, particularly in dentistry. This study aimed to evaluate and compare the clinical reasoning capabilities of ChatGPT-3.5 and GPT-4 in formulating treatment plans across seven dental specialties using realistic, open-ended clinical scenarios.</p><p><strong>Methods: </strong>A cross-sectional analytical study, reported in accordance with the STROBE guidelines, was conducted using 70 dental cases spanning endodontics, oral and maxillofacial surgery, oral medicine, orthodontics, paediatric dentistry, periodontology, and radiology. Each case was submitted to both ChatGPT-3.5 and GPT-4 (paid version, November 2024). Responses were evaluated by specialty-specific expert panels using a three-level rubric (poor, average, good). Statistical analyses included chi-square tests and Fisher-Freeman-Halton exact tests (α = 0.05).</p><p><strong>Results: </strong>GPT-4 significantly outperformed GPT-3.5 in overall response quality (67.1% vs. 44.3% rated as 'good'; p = 0.016). Although no significant differences were observed across most specialties, GPT-4 showed a statistically superior performance in oral and maxillofacial surgery. Its advantage was more pronounced in complex cases, aligning with the model's enhanced contextual reasoning.</p><p><strong>Conclusion: </strong>GPT-4 demonstrated superior accuracy and consistency compared to GPT-3.5, particularly in clinically complex and integrative tasks. These findings support the potential of advanced LLMs as adjunct tools in dental education and decision-making, though specialty-specific applications and expert oversight remain essential.</p>","PeriodicalId":50488,"journal":{"name":"European Journal of Dental Education","volume":" ","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative Analysis of ChatGPT-3.5 and GPT-4 in Open-Ended Clinical Reasoning Across Dental Specialties.\",\"authors\":\"Yasamin Babaee Hemmati, Morteza Rasouli, Mehran Falahchai\",\"doi\":\"10.1111/eje.13144\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>The integration of large language models (LLMs) such as ChatGPT into health care has garnered increasing interest. While previous studies have assessed these models using structured multiple-choice questions, limited research has evaluated their performance on open-ended, scenario-based clinical tasks, particularly in dentistry. This study aimed to evaluate and compare the clinical reasoning capabilities of ChatGPT-3.5 and GPT-4 in formulating treatment plans across seven dental specialties using realistic, open-ended clinical scenarios.</p><p><strong>Methods: </strong>A cross-sectional analytical study, reported in accordance with the STROBE guidelines, was conducted using 70 dental cases spanning endodontics, oral and maxillofacial surgery, oral medicine, orthodontics, paediatric dentistry, periodontology, and radiology. Each case was submitted to both ChatGPT-3.5 and GPT-4 (paid version, November 2024). Responses were evaluated by specialty-specific expert panels using a three-level rubric (poor, average, good). Statistical analyses included chi-square tests and Fisher-Freeman-Halton exact tests (α = 0.05).</p><p><strong>Results: </strong>GPT-4 significantly outperformed GPT-3.5 in overall response quality (67.1% vs. 44.3% rated as 'good'; p = 0.016). Although no significant differences were observed across most specialties, GPT-4 showed a statistically superior performance in oral and maxillofacial surgery. Its advantage was more pronounced in complex cases, aligning with the model's enhanced contextual reasoning.</p><p><strong>Conclusion: </strong>GPT-4 demonstrated superior accuracy and consistency compared to GPT-3.5, particularly in clinically complex and integrative tasks. These findings support the potential of advanced LLMs as adjunct tools in dental education and decision-making, though specialty-specific applications and expert oversight remain essential.</p>\",\"PeriodicalId\":50488,\"journal\":{\"name\":\"European Journal of Dental Education\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2025-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Journal of Dental Education\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://doi.org/10.1111/eje.13144\",\"RegionNum\":4,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Dental Education","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1111/eje.13144","RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0
摘要
目的:将大型语言模型(llm)(如ChatGPT)集成到医疗保健中已经引起了越来越多的兴趣。虽然以前的研究使用结构化选择题来评估这些模型,但有限的研究评估了它们在开放式、基于场景的临床任务中的表现,特别是在牙科领域。本研究旨在评估和比较ChatGPT-3.5和GPT-4在七个牙科专业制定治疗计划时的临床推理能力,使用现实的开放式临床场景。方法:根据STROBE指南,对70例牙科病例进行了横断面分析研究,涵盖牙髓学、口腔颌面外科、口腔医学、正畸学、儿科牙科、牙周学和放射学。每个案例都提交给ChatGPT-3.5和GPT-4(付费版本,2024年11月)。回答由专门的专家小组使用三级标准(差、平均、好)进行评估。统计学分析采用卡方检验和Fisher-Freeman-Halton精确检验(α = 0.05)。结果:GPT-4在总体反应质量上显著优于GPT-3.5 (67.1% vs. 44.3%);p = 0.016)。虽然在大多数专科中没有观察到显著差异,但GPT-4在口腔颌面外科中表现出统计学上的优势。它的优势在复杂的情况下更为明显,与模型增强的上下文推理相一致。结论:与GPT-3.5相比,GPT-4表现出更高的准确性和一致性,特别是在临床复杂和综合任务中。这些发现支持了高级法学硕士作为牙科教育和决策辅助工具的潜力,尽管特殊的应用和专家监督仍然是必不可少的。
Comparative Analysis of ChatGPT-3.5 and GPT-4 in Open-Ended Clinical Reasoning Across Dental Specialties.
Purpose: The integration of large language models (LLMs) such as ChatGPT into health care has garnered increasing interest. While previous studies have assessed these models using structured multiple-choice questions, limited research has evaluated their performance on open-ended, scenario-based clinical tasks, particularly in dentistry. This study aimed to evaluate and compare the clinical reasoning capabilities of ChatGPT-3.5 and GPT-4 in formulating treatment plans across seven dental specialties using realistic, open-ended clinical scenarios.
Methods: A cross-sectional analytical study, reported in accordance with the STROBE guidelines, was conducted using 70 dental cases spanning endodontics, oral and maxillofacial surgery, oral medicine, orthodontics, paediatric dentistry, periodontology, and radiology. Each case was submitted to both ChatGPT-3.5 and GPT-4 (paid version, November 2024). Responses were evaluated by specialty-specific expert panels using a three-level rubric (poor, average, good). Statistical analyses included chi-square tests and Fisher-Freeman-Halton exact tests (α = 0.05).
Results: GPT-4 significantly outperformed GPT-3.5 in overall response quality (67.1% vs. 44.3% rated as 'good'; p = 0.016). Although no significant differences were observed across most specialties, GPT-4 showed a statistically superior performance in oral and maxillofacial surgery. Its advantage was more pronounced in complex cases, aligning with the model's enhanced contextual reasoning.
Conclusion: GPT-4 demonstrated superior accuracy and consistency compared to GPT-3.5, particularly in clinically complex and integrative tasks. These findings support the potential of advanced LLMs as adjunct tools in dental education and decision-making, though specialty-specific applications and expert oversight remain essential.
期刊介绍:
The aim of the European Journal of Dental Education is to publish original topical and review articles of the highest quality in the field of Dental Education. The Journal seeks to disseminate widely the latest information on curriculum development teaching methodologies assessment techniques and quality assurance in the fields of dental undergraduate and postgraduate education and dental auxiliary personnel training. The scope includes the dental educational aspects of the basic medical sciences the behavioural sciences the interface with medical education information technology and distance learning and educational audit. Papers embodying the results of high-quality educational research of relevance to dentistry are particularly encouraged as are evidence-based reports of novel and established educational programmes and their outcomes.