deepseek和ChatGPT能否用于口腔病理的诊断？

IF 2.6 2区医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE

BMC Oral Health Pub Date : 2025-04-25 DOI:10.1186/s12903-025-06034-x

Ömer Faruk Kaygisiz, Mehmet Turhan Teke

{"title":"deepseek和ChatGPT能否用于口腔病理的诊断？","authors":"Ömer Faruk Kaygisiz, Mehmet Turhan Teke","doi":"10.1186/s12903-025-06034-x","DOIUrl":null,"url":null,"abstract":"Objective: Artificial intelligence (AI) has been widely used in various medical fields to support diagnostic development. The development of different AI techniques has made important contributions to early diagnoses. This research compares and evaluates the diagnostic accuracy of ChatGPT-4o and Deepseek-v3 AI applications in 16 clinical case scenarios in oral pathologies.Methodology: Clinical case scenarios of 16 imaginary oral pathologies were prepared by the authors. The cases were asked to provide 3 possible preliminary diagnoses to two different AI applications, DeepSeek-V3 and ChatGPT-4o, and to reference the literature for these diagnoses. The diagnoses of both AI applications were evaluated with Likert scale by 20 different specialists from two different specialties.Results: The mean score for DeepSeek-v3 was 4.02 ± 0.36. For ChatGPT-4o it was 3.15 ± 0.41. According to the average scores, both models performed at a moderate to high level. Also, between the two AI models. DeepSeek-v3 was statistically better in 9 out of 16 clinical scenarios, while ChatGPT-4o was statistically better in 1 question. In general, DeepSeek-v3 was statistically more successful in the comparison of the two models (p = 0.024). In terms of references, ChatGPT-4o showed 62 references and 50 of them were fake, while 8 out of 48 references were fake in DeepSeek-v3.Conclusions: Chatbot applications have the potential to become a valuable consultant for clinicians in the future thanks to its fast-processing ability. It is clear that it can help healthcare services by reducing the workload of clinicians. It can be said that the Deepseek-v3 model produces better results compared to ChatGPT-4o, but both applications need to be improved for routine use. It is thought that the release of versions of AI models that can only perform scans in the medical field and respond to clinicians by providing more reliable resources may make these models more valuable.","PeriodicalId":9072,"journal":{"name":"BMC Oral Health","volume":"25 1","pages":"638"},"PeriodicalIF":2.6000,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12023442/pdf/","citationCount":"0","resultStr":"{\"title\":\"Can deepseek and ChatGPT be used in the diagnosis of oral pathologies?\",\"authors\":\"Ömer Faruk Kaygisiz, Mehmet Turhan Teke\",\"doi\":\"10.1186/s12903-025-06034-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective: Artificial intelligence (AI) has been widely used in various medical fields to support diagnostic development. The development of different AI techniques has made important contributions to early diagnoses. This research compares and evaluates the diagnostic accuracy of ChatGPT-4o and Deepseek-v3 AI applications in 16 clinical case scenarios in oral pathologies.Methodology: Clinical case scenarios of 16 imaginary oral pathologies were prepared by the authors. The cases were asked to provide 3 possible preliminary diagnoses to two different AI applications, DeepSeek-V3 and ChatGPT-4o, and to reference the literature for these diagnoses. The diagnoses of both AI applications were evaluated with Likert scale by 20 different specialists from two different specialties.Results: The mean score for DeepSeek-v3 was 4.02 ± 0.36. For ChatGPT-4o it was 3.15 ± 0.41. According to the average scores, both models performed at a moderate to high level. Also, between the two AI models. DeepSeek-v3 was statistically better in 9 out of 16 clinical scenarios, while ChatGPT-4o was statistically better in 1 question. In general, DeepSeek-v3 was statistically more successful in the comparison of the two models (p = 0.024). In terms of references, ChatGPT-4o showed 62 references and 50 of them were fake, while 8 out of 48 references were fake in DeepSeek-v3.Conclusions: Chatbot applications have the potential to become a valuable consultant for clinicians in the future thanks to its fast-processing ability. It is clear that it can help healthcare services by reducing the workload of clinicians. It can be said that the Deepseek-v3 model produces better results compared to ChatGPT-4o, but both applications need to be improved for routine use. It is thought that the release of versions of AI models that can only perform scans in the medical field and respond to clinicians by providing more reliable resources may make these models more valuable.\",\"PeriodicalId\":9072,\"journal\":{\"name\":\"BMC Oral Health\",\"volume\":\"25 1\",\"pages\":\"638\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-04-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12023442/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Oral Health\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12903-025-06034-x\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Oral Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12903-025-06034-x","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

摘要

目的：人工智能（AI）已广泛应用于各个医疗领域，以支持诊断发展。不同人工智能技术的发展为早期诊断做出了重要贡献。本研究比较和评估了chatgpt - 40和Deepseek-v3人工智能在16种口腔病理临床病例中的诊断准确性。方法：作者拟备16例口腔假想病理的临床病例。这些病例被要求对两种不同的人工智能应用程序DeepSeek-V3和chatgpt - 40提供3种可能的初步诊断，并参考这些诊断的文献。这两种人工智能应用的诊断由来自两个不同专业的20名不同专家用李克特量表进行评估。结果：DeepSeek-v3的平均评分为4.02±0.36。chatgpt - 40为3.15±0.41。从平均得分来看，两种模型都处于中等到较高的水平。同样，在两个人工智能模型之间。在16个临床场景中，DeepSeek-v3在9个问题上表现更好，而chatgpt - 40在1个问题上表现更好。总的来说，在两种模型的比较中，DeepSeek-v3在统计上更成功（p = 0.024）。在参考文献方面，chatgpt - 40有62篇，其中50篇是假的，而DeepSeek-v3有48篇是假的，有8篇是假的。结论：由于聊天机器人的快速处理能力，它有可能在未来成为临床医生的宝贵顾问。很明显，它可以通过减少临床医生的工作量来帮助医疗保健服务。可以说，与chatgpt - 40相比，Deepseek-v3模型产生了更好的结果，但这两个应用程序都需要改进才能正常使用。有人认为，只能在医疗领域进行扫描并通过提供更可靠的资源来响应临床医生的人工智能模型版本的发布可能会使这些模型更有价值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Can deepseek and ChatGPT be used in the diagnosis of oral pathologies?

Objective: Artificial intelligence (AI) has been widely used in various medical fields to support diagnostic development. The development of different AI techniques has made important contributions to early diagnoses. This research compares and evaluates the diagnostic accuracy of ChatGPT-4o and Deepseek-v3 AI applications in 16 clinical case scenarios in oral pathologies.

Methodology: Clinical case scenarios of 16 imaginary oral pathologies were prepared by the authors. The cases were asked to provide 3 possible preliminary diagnoses to two different AI applications, DeepSeek-V3 and ChatGPT-4o, and to reference the literature for these diagnoses. The diagnoses of both AI applications were evaluated with Likert scale by 20 different specialists from two different specialties.

Results: The mean score for DeepSeek-v3 was 4.02 ± 0.36. For ChatGPT-4o it was 3.15 ± 0.41. According to the average scores, both models performed at a moderate to high level. Also, between the two AI models. DeepSeek-v3 was statistically better in 9 out of 16 clinical scenarios, while ChatGPT-4o was statistically better in 1 question. In general, DeepSeek-v3 was statistically more successful in the comparison of the two models (p = 0.024). In terms of references, ChatGPT-4o showed 62 references and 50 of them were fake, while 8 out of 48 references were fake in DeepSeek-v3.

Conclusions: Chatbot applications have the potential to become a valuable consultant for clinicians in the future thanks to its fast-processing ability. It is clear that it can help healthcare services by reducing the workload of clinicians. It can be said that the Deepseek-v3 model produces better results compared to ChatGPT-4o, but both applications need to be improved for routine use. It is thought that the release of versions of AI models that can only perform scans in the medical field and respond to clinicians by providing more reliable resources may make these models more valuable.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMC Oral Health DENTISTRY, ORAL SURGERY & MEDICINE-

CiteScore

3.90

自引率

6.90%

发文量

481

审稿时长

6-12 weeks

期刊介绍： BMC Oral Health is an open access, peer-reviewed journal that considers articles on all aspects of the prevention, diagnosis and management of disorders of the mouth, teeth and gums, as well as related molecular genetics, pathophysiology, and epidemiology.