重拳出击:Deepseek-R1和openai - 01在胰腺腺癌相关问题上的正面比较

IF 3.2 3区 医学 Q1 MEDICINE, GENERAL & INTERNAL
International Journal of Medical Sciences Pub Date : 2025-08-22 eCollection Date: 2025-01-01 DOI:10.7150/ijms.118887
Cheng-Peng Li, Yuan Chu, Wei-Wei Jia, Priska Hakenberg, Flavius Șandra-Petrescu, Christoph Reißfelder, Cui Yang
{"title":"重拳出击:Deepseek-R1和openai - 01在胰腺腺癌相关问题上的正面比较","authors":"Cheng-Peng Li, Yuan Chu, Wei-Wei Jia, Priska Hakenberg, Flavius Șandra-Petrescu, Christoph Reißfelder, Cui Yang","doi":"10.7150/ijms.118887","DOIUrl":null,"url":null,"abstract":"<p><p><b>Objective:</b> This study aimed to compare the performance of DeepSeek-R1 and OpenAI-o1 in addressing complex pancreatic ductal adenocarcinoma (PDAC)-related clinical questions, focusing on accuracy, comprehensiveness, safety, and reasoning quality. <b>Methods:</b> Twenty PDAC-related questions derived from the up-to-date NCCN guidelines for PDAC were posed to both models. Responses were evaluated for accuracy, comprehensiveness, and safety, and chain-of-thought (CoT) outputs were rated for logical coherence and error handling by blinded clinical experts using 5-point Likert scales. Inter-rater reliability, evaluated scores, and character counts by both models were compared. <b>Results:</b> Both models demonstrated high accuracy (median score: 5 vs. 5, p=0.527) and safety (5 vs. 5, p=0.285). DeepSeek-R1 outperformed OpenAI-o1 in comprehensiveness (median: 5 vs. 4.5, p=0.015) and generated significantly longer responses (median characters: 544 vs. 248, p<0.001). For reasoning quality, DeepSeek-R1 achieved superior scores in logical coherence (median: 5 vs. 4, p<0.001) and error handling (5 vs. 4, p<0.001), with 75% of its responses scoring full points compared to OpenAI-o1's 5%. <b>Conclusion:</b> While both models exhibit high clinical utility, DeepSeek-R1's enhanced reasoning capabilities, open-source nature, and cost-effectiveness position it as a promising tool for complex oncology decision support. Further validation in real-world multimodal clinical scenarios is warranted.</p>","PeriodicalId":14031,"journal":{"name":"International Journal of Medical Sciences","volume":"22 15","pages":"3868-3877"},"PeriodicalIF":3.2000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12492379/pdf/","citationCount":"0","resultStr":"{\"title\":\"Punching Above Its Weight: A Head-to-Head Comparison of Deepseek-R1 and OpenAI-o1 on Pancreatic Adenocarcinoma-Related Questions.\",\"authors\":\"Cheng-Peng Li, Yuan Chu, Wei-Wei Jia, Priska Hakenberg, Flavius Șandra-Petrescu, Christoph Reißfelder, Cui Yang\",\"doi\":\"10.7150/ijms.118887\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b>Objective:</b> This study aimed to compare the performance of DeepSeek-R1 and OpenAI-o1 in addressing complex pancreatic ductal adenocarcinoma (PDAC)-related clinical questions, focusing on accuracy, comprehensiveness, safety, and reasoning quality. <b>Methods:</b> Twenty PDAC-related questions derived from the up-to-date NCCN guidelines for PDAC were posed to both models. Responses were evaluated for accuracy, comprehensiveness, and safety, and chain-of-thought (CoT) outputs were rated for logical coherence and error handling by blinded clinical experts using 5-point Likert scales. Inter-rater reliability, evaluated scores, and character counts by both models were compared. <b>Results:</b> Both models demonstrated high accuracy (median score: 5 vs. 5, p=0.527) and safety (5 vs. 5, p=0.285). DeepSeek-R1 outperformed OpenAI-o1 in comprehensiveness (median: 5 vs. 4.5, p=0.015) and generated significantly longer responses (median characters: 544 vs. 248, p<0.001). For reasoning quality, DeepSeek-R1 achieved superior scores in logical coherence (median: 5 vs. 4, p<0.001) and error handling (5 vs. 4, p<0.001), with 75% of its responses scoring full points compared to OpenAI-o1's 5%. <b>Conclusion:</b> While both models exhibit high clinical utility, DeepSeek-R1's enhanced reasoning capabilities, open-source nature, and cost-effectiveness position it as a promising tool for complex oncology decision support. Further validation in real-world multimodal clinical scenarios is warranted.</p>\",\"PeriodicalId\":14031,\"journal\":{\"name\":\"International Journal of Medical Sciences\",\"volume\":\"22 15\",\"pages\":\"3868-3877\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12492379/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Medical Sciences\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.7150/ijms.118887\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Sciences","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.7150/ijms.118887","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

摘要

目的:本研究旨在比较DeepSeek-R1和openai - 01在解决复杂胰导管腺癌(PDAC)相关临床问题上的表现,重点关注准确性、全面性、安全性和推理质量。方法:从最新的NCCN PDAC指南中衍生出20个PDAC相关问题,并向两个模型提出。评估反应的准确性、全面性和安全性,并由盲法临床专家使用5点李克特量表评估思维链(CoT)输出的逻辑一致性和错误处理。比较了两种模型的内部信度、评估分数和字符计数。结果:两种模型均具有较高的准确率(中位评分:5比5,p=0.527)和安全性(5比5,p=0.285)。DeepSeek-R1在全面性方面优于openai - 01(中位数:5比4.5,p=0.015),并且产生的应答时间明显更长(中位数:544比248,p)。结论:虽然两种模型都具有很高的临床实用性,但DeepSeek-R1增强的推理能力、开源性质和成本效益使其成为复杂肿瘤决策支持的有前途的工具。在真实世界的多模式临床场景中进一步验证是必要的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Punching Above Its Weight: A Head-to-Head Comparison of Deepseek-R1 and OpenAI-o1 on Pancreatic Adenocarcinoma-Related Questions.

Punching Above Its Weight: A Head-to-Head Comparison of Deepseek-R1 and OpenAI-o1 on Pancreatic Adenocarcinoma-Related Questions.

Punching Above Its Weight: A Head-to-Head Comparison of Deepseek-R1 and OpenAI-o1 on Pancreatic Adenocarcinoma-Related Questions.

Punching Above Its Weight: A Head-to-Head Comparison of Deepseek-R1 and OpenAI-o1 on Pancreatic Adenocarcinoma-Related Questions.

Objective: This study aimed to compare the performance of DeepSeek-R1 and OpenAI-o1 in addressing complex pancreatic ductal adenocarcinoma (PDAC)-related clinical questions, focusing on accuracy, comprehensiveness, safety, and reasoning quality. Methods: Twenty PDAC-related questions derived from the up-to-date NCCN guidelines for PDAC were posed to both models. Responses were evaluated for accuracy, comprehensiveness, and safety, and chain-of-thought (CoT) outputs were rated for logical coherence and error handling by blinded clinical experts using 5-point Likert scales. Inter-rater reliability, evaluated scores, and character counts by both models were compared. Results: Both models demonstrated high accuracy (median score: 5 vs. 5, p=0.527) and safety (5 vs. 5, p=0.285). DeepSeek-R1 outperformed OpenAI-o1 in comprehensiveness (median: 5 vs. 4.5, p=0.015) and generated significantly longer responses (median characters: 544 vs. 248, p<0.001). For reasoning quality, DeepSeek-R1 achieved superior scores in logical coherence (median: 5 vs. 4, p<0.001) and error handling (5 vs. 4, p<0.001), with 75% of its responses scoring full points compared to OpenAI-o1's 5%. Conclusion: While both models exhibit high clinical utility, DeepSeek-R1's enhanced reasoning capabilities, open-source nature, and cost-effectiveness position it as a promising tool for complex oncology decision support. Further validation in real-world multimodal clinical scenarios is warranted.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal of Medical Sciences
International Journal of Medical Sciences MEDICINE, GENERAL & INTERNAL-
CiteScore
7.20
自引率
0.00%
发文量
185
审稿时长
2.7 months
期刊介绍: Original research papers, reviews, and short research communications in any medical related area can be submitted to the Journal on the understanding that the work has not been published previously in whole or part and is not under consideration for publication elsewhere. Manuscripts in basic science and clinical medicine are both considered. There is no restriction on the length of research papers and reviews, although authors are encouraged to be concise. Short research communication is limited to be under 2500 words.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信