{"title":"谁最懂解剖学?chatgpt - 40、DeepSeek、Gemini和Claude的比较研究。","authors":"Melek Tassoker","doi":"10.1002/ca.70012","DOIUrl":null,"url":null,"abstract":"<p><p>This study evaluates the performance of ChatGPT-4o (OpenAI), DeepSeek-v3 (DeepSeek), Gemini 2.0 (Google DeepMind), and Claude 3.7 Sonnet (Anthropic) in answering anatomy questions from the Turkish Dental Specialty Admission Exam (DUS). The study aims to compare their accuracy, response times, and answer lengths. A total of 74 text-based multiple choice anatomy questions from the Turkish Dental Specialty Admission Exam (DUS) administered between 2012 and 2021 were analyzed in this study. The questions varied in difficulty and included both basic anatomical identification and clinically oriented scenarios, with a majority focusing on head and neck anatomy, followed by thorax, neuroanatomy, and musculoskeletal regions, which are particularly relevant to dental education. The accuracy of answers was evaluated against official sources, and response times and word counts were recorded. Statistical analyses, including the Kruskal-Wallis and Cochran's Q tests, were used to compare performance differences. ChatGPT-4o demonstrated the highest accuracy (98.6%), while the other models achieved the same rate of 89.2%. Gemini produced the fastest responses (mean: 4.47 s), whereas DeepSeek generated the shortest answers and Gemini the longest (p = 0.000). The differences in accuracy, response times, and word count were statistically significant (p < 0.05). ChatGPT-4o outperformed other models in accuracy for DUS anatomy questions, suggesting its superior potential as a tool for dental education. Future research should explore the integration of LLMs into structured learning programs.</p>","PeriodicalId":50687,"journal":{"name":"Clinical Anatomy","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Who Knows Anatomy Best? A Comparative Study of ChatGPT-4o, DeepSeek, Gemini, and Claude.\",\"authors\":\"Melek Tassoker\",\"doi\":\"10.1002/ca.70012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>This study evaluates the performance of ChatGPT-4o (OpenAI), DeepSeek-v3 (DeepSeek), Gemini 2.0 (Google DeepMind), and Claude 3.7 Sonnet (Anthropic) in answering anatomy questions from the Turkish Dental Specialty Admission Exam (DUS). The study aims to compare their accuracy, response times, and answer lengths. A total of 74 text-based multiple choice anatomy questions from the Turkish Dental Specialty Admission Exam (DUS) administered between 2012 and 2021 were analyzed in this study. The questions varied in difficulty and included both basic anatomical identification and clinically oriented scenarios, with a majority focusing on head and neck anatomy, followed by thorax, neuroanatomy, and musculoskeletal regions, which are particularly relevant to dental education. The accuracy of answers was evaluated against official sources, and response times and word counts were recorded. Statistical analyses, including the Kruskal-Wallis and Cochran's Q tests, were used to compare performance differences. ChatGPT-4o demonstrated the highest accuracy (98.6%), while the other models achieved the same rate of 89.2%. Gemini produced the fastest responses (mean: 4.47 s), whereas DeepSeek generated the shortest answers and Gemini the longest (p = 0.000). The differences in accuracy, response times, and word count were statistically significant (p < 0.05). ChatGPT-4o outperformed other models in accuracy for DUS anatomy questions, suggesting its superior potential as a tool for dental education. Future research should explore the integration of LLMs into structured learning programs.</p>\",\"PeriodicalId\":50687,\"journal\":{\"name\":\"Clinical Anatomy\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical Anatomy\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1002/ca.70012\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ANATOMY & MORPHOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Anatomy","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/ca.70012","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ANATOMY & MORPHOLOGY","Score":null,"Total":0}
Who Knows Anatomy Best? A Comparative Study of ChatGPT-4o, DeepSeek, Gemini, and Claude.
This study evaluates the performance of ChatGPT-4o (OpenAI), DeepSeek-v3 (DeepSeek), Gemini 2.0 (Google DeepMind), and Claude 3.7 Sonnet (Anthropic) in answering anatomy questions from the Turkish Dental Specialty Admission Exam (DUS). The study aims to compare their accuracy, response times, and answer lengths. A total of 74 text-based multiple choice anatomy questions from the Turkish Dental Specialty Admission Exam (DUS) administered between 2012 and 2021 were analyzed in this study. The questions varied in difficulty and included both basic anatomical identification and clinically oriented scenarios, with a majority focusing on head and neck anatomy, followed by thorax, neuroanatomy, and musculoskeletal regions, which are particularly relevant to dental education. The accuracy of answers was evaluated against official sources, and response times and word counts were recorded. Statistical analyses, including the Kruskal-Wallis and Cochran's Q tests, were used to compare performance differences. ChatGPT-4o demonstrated the highest accuracy (98.6%), while the other models achieved the same rate of 89.2%. Gemini produced the fastest responses (mean: 4.47 s), whereas DeepSeek generated the shortest answers and Gemini the longest (p = 0.000). The differences in accuracy, response times, and word count were statistically significant (p < 0.05). ChatGPT-4o outperformed other models in accuracy for DUS anatomy questions, suggesting its superior potential as a tool for dental education. Future research should explore the integration of LLMs into structured learning programs.
期刊介绍:
Clinical Anatomy is the Official Journal of the American Association of Clinical Anatomists and the British Association of Clinical Anatomists. The goal of Clinical Anatomy is to provide a medium for the exchange of current information between anatomists and clinicians. This journal embraces anatomy in all its aspects as applied to medical practice. Furthermore, the journal assists physicians and other health care providers in keeping abreast of new methodologies for patient management and informs educators of new developments in clinical anatomy and teaching techniques. Clinical Anatomy publishes original and review articles of scientific, clinical, and educational interest. Papers covering the application of anatomic principles to the solution of clinical problems and/or the application of clinical observations to expand anatomic knowledge are welcomed.