谁最懂解剖学？chatgpt - 40、DeepSeek、Gemini和Claude的比较研究。

IF 2.3 4区医学 Q1 ANATOMY & MORPHOLOGY

Clinical Anatomy Pub Date : 2025-07-24 DOI:10.1002/ca.70012

Melek Tassoker

{"title":"谁最懂解剖学？chatgpt - 40、DeepSeek、Gemini和Claude的比较研究。","authors":"Melek Tassoker","doi":"10.1002/ca.70012","DOIUrl":null,"url":null,"abstract":"This study evaluates the performance of ChatGPT-4o (OpenAI), DeepSeek-v3 (DeepSeek), Gemini 2.0 (Google DeepMind), and Claude 3.7 Sonnet (Anthropic) in answering anatomy questions from the Turkish Dental Specialty Admission Exam (DUS). The study aims to compare their accuracy, response times, and answer lengths. A total of 74 text-based multiple choice anatomy questions from the Turkish Dental Specialty Admission Exam (DUS) administered between 2012 and 2021 were analyzed in this study. The questions varied in difficulty and included both basic anatomical identification and clinically oriented scenarios, with a majority focusing on head and neck anatomy, followed by thorax, neuroanatomy, and musculoskeletal regions, which are particularly relevant to dental education. The accuracy of answers was evaluated against official sources, and response times and word counts were recorded. Statistical analyses, including the Kruskal-Wallis and Cochran's Q tests, were used to compare performance differences. ChatGPT-4o demonstrated the highest accuracy (98.6%), while the other models achieved the same rate of 89.2%. Gemini produced the fastest responses (mean: 4.47 s), whereas DeepSeek generated the shortest answers and Gemini the longest (p = 0.000). The differences in accuracy, response times, and word count were statistically significant (p < 0.05). ChatGPT-4o outperformed other models in accuracy for DUS anatomy questions, suggesting its superior potential as a tool for dental education. Future research should explore the integration of LLMs into structured learning programs.","PeriodicalId":50687,"journal":{"name":"Clinical Anatomy","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Who Knows Anatomy Best? A Comparative Study of ChatGPT-4o, DeepSeek, Gemini, and Claude.\",\"authors\":\"Melek Tassoker\",\"doi\":\"10.1002/ca.70012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study evaluates the performance of ChatGPT-4o (OpenAI), DeepSeek-v3 (DeepSeek), Gemini 2.0 (Google DeepMind), and Claude 3.7 Sonnet (Anthropic) in answering anatomy questions from the Turkish Dental Specialty Admission Exam (DUS). The study aims to compare their accuracy, response times, and answer lengths. A total of 74 text-based multiple choice anatomy questions from the Turkish Dental Specialty Admission Exam (DUS) administered between 2012 and 2021 were analyzed in this study. The questions varied in difficulty and included both basic anatomical identification and clinically oriented scenarios, with a majority focusing on head and neck anatomy, followed by thorax, neuroanatomy, and musculoskeletal regions, which are particularly relevant to dental education. The accuracy of answers was evaluated against official sources, and response times and word counts were recorded. Statistical analyses, including the Kruskal-Wallis and Cochran's Q tests, were used to compare performance differences. ChatGPT-4o demonstrated the highest accuracy (98.6%), while the other models achieved the same rate of 89.2%. Gemini produced the fastest responses (mean: 4.47 s), whereas DeepSeek generated the shortest answers and Gemini the longest (p = 0.000). The differences in accuracy, response times, and word count were statistically significant (p < 0.05). ChatGPT-4o outperformed other models in accuracy for DUS anatomy questions, suggesting its superior potential as a tool for dental education. Future research should explore the integration of LLMs into structured learning programs.\",\"PeriodicalId\":50687,\"journal\":{\"name\":\"Clinical Anatomy\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical Anatomy\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1002/ca.70012\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ANATOMY & MORPHOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Anatomy","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/ca.70012","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ANATOMY & MORPHOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

本研究评估了chatgpt - 40 （OpenAI）、DeepSeek-v3 （DeepSeek）、Gemini 2.0（谷歌DeepMind）和Claude 3.7 Sonnet （Anthropic）在回答土耳其牙科专业入学考试（DUS）中的解剖学问题时的表现。这项研究的目的是比较他们的准确性、反应时间和回答长度。本研究分析了2012年至2021年土耳其牙科专业入学考试（DUS）中74道基于文本的选择题解剖题。这些问题的难度各不相同，包括基本的解剖鉴定和临床导向的场景，大多数集中在头颈部解剖，其次是胸部、神经解剖和肌肉骨骼区域，这些区域与牙科教育特别相关。根据官方资料对答案的准确性进行了评估，并记录了回答时间和字数。统计分析，包括Kruskal-Wallis和Cochran的Q测试，被用来比较性能差异。chatgpt - 40显示出最高的准确率（98.6%），而其他模型的准确率为89.2%。Gemini给出了最快的答案（平均：4.47秒），而DeepSeek给出了最短的答案，Gemini给出了最长的答案（p = 0.000）。在准确性、反应时间和字数上的差异有统计学意义(p

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Who Knows Anatomy Best? A Comparative Study of ChatGPT-4o, DeepSeek, Gemini, and Claude.

This study evaluates the performance of ChatGPT-4o (OpenAI), DeepSeek-v3 (DeepSeek), Gemini 2.0 (Google DeepMind), and Claude 3.7 Sonnet (Anthropic) in answering anatomy questions from the Turkish Dental Specialty Admission Exam (DUS). The study aims to compare their accuracy, response times, and answer lengths. A total of 74 text-based multiple choice anatomy questions from the Turkish Dental Specialty Admission Exam (DUS) administered between 2012 and 2021 were analyzed in this study. The questions varied in difficulty and included both basic anatomical identification and clinically oriented scenarios, with a majority focusing on head and neck anatomy, followed by thorax, neuroanatomy, and musculoskeletal regions, which are particularly relevant to dental education. The accuracy of answers was evaluated against official sources, and response times and word counts were recorded. Statistical analyses, including the Kruskal-Wallis and Cochran's Q tests, were used to compare performance differences. ChatGPT-4o demonstrated the highest accuracy (98.6%), while the other models achieved the same rate of 89.2%. Gemini produced the fastest responses (mean: 4.47 s), whereas DeepSeek generated the shortest answers and Gemini the longest (p = 0.000). The differences in accuracy, response times, and word count were statistically significant (p < 0.05). ChatGPT-4o outperformed other models in accuracy for DUS anatomy questions, suggesting its superior potential as a tool for dental education. Future research should explore the integration of LLMs into structured learning programs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Clinical Anatomy 医学-解剖学与形态学

CiteScore

5.50

自引率

12.50%

发文量

154

审稿时长

3 months

期刊介绍： Clinical Anatomy is the Official Journal of the American Association of Clinical Anatomists and the British Association of Clinical Anatomists. The goal of Clinical Anatomy is to provide a medium for the exchange of current information between anatomists and clinicians. This journal embraces anatomy in all its aspects as applied to medical practice. Furthermore, the journal assists physicians and other health care providers in keeping abreast of new methodologies for patient management and informs educators of new developments in clinical anatomy and teaching techniques. Clinical Anatomy publishes original and review articles of scientific, clinical, and educational interest. Papers covering the application of anatomic principles to the solution of clinical problems and/or the application of clinical observations to expand anatomic knowledge are welcomed.