Advancing dental diagnostics with OpenAI's o1-preview

IF 3.5 2区医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE

Journal of the American Dental Association Pub Date : 2025-07-01 DOI:10.1016/j.adaj.2025.04.003

Arman Danesh BMSc, Arsalan Danesh DDS, Farzad Danesh DDS, MSC

{"title":"Advancing dental diagnostics with OpenAI's o1-preview","authors":"Arman Danesh BMSc, Arsalan Danesh DDS, Farzad Danesh DDS, MSC","doi":"10.1016/j.adaj.2025.04.003","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>The introduction of o1-preview (OpenAI) has stirred discussions surrounding its potential applications for diagnosing complex patient cases. The authors gauged changes in o1-preview’s capacity to diagnose complex cases compared with its predecessors ChatGPT-3.5 (OpenAI) and ChatGPT-4 (legacy) (OpenAI).</div></div><div><h3>Methods</h3><div>The authors used diagnostic challenges retrieved from the literature using 2 different approaches to elucidate o1-preview’s capacity to produce plausible differential diagnoses (DDs) and final diagnoses (FDs). The first approach instructed the chatbot to independently construct a DD before selecting a final diagnosis. The second approach instructed the chatbot to rely on DDs retrieved from the literature accompanying the diagnostic challenge. A 2-tailed <em>t</em> test was used to compare sample means, and a 2-tailed χ<sup>2</sup> test was used to compare sample proportions. A <em>P</em> value < .05 was considered statistically significant.</div></div><div><h3>Results</h3><div>The o1-preview model produced a plausible DD and a correct diagnosis for 94% and 80% of cases, respectively, when relying on an independent diagnostic approach, marking a significant increase from ChatGPT-3.5 (DD: difference, 32%; <em>P =</em> .001; FD: difference, 40%; <em>P</em> < .001) and ChatGPT-4 (legacy) (DD: difference, 18%; <em>P =</em> .012; FD: difference, 18%; <em>P</em> = .048). When relying on DDs retrieved from the literature, the model achieved a diagnostic accuracy of 86%, displaying a superior performance than its predecessors, although these results were not significant (ChatGPT-3.5: difference, 16%; <em>P</em> = .055; ChatGPT-4 (legacy): difference, 6%; <em>P</em> = .427).</div></div><div><h3>Conclusions</h3><div>Although further validation is required, the transformative findings of this investigation shift the discussion surrounding ChatGPT’s integration as a diagnostic tool to be not a question of if, but instead a matter of when.</div></div><div><h3>Practical Implications</h3><div>Although o1-preview has yet to achieve a proficient diagnostic accuracy, the model served well in generating DDs for complex cases.</div></div>","PeriodicalId":17197,"journal":{"name":"Journal of the American Dental Association","volume":"156 7","pages":"Pages 555-562.e3"},"PeriodicalIF":3.5000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Dental Association","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0002817725002223","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

Abstract

Background

The introduction of o1-preview (OpenAI) has stirred discussions surrounding its potential applications for diagnosing complex patient cases. The authors gauged changes in o1-preview’s capacity to diagnose complex cases compared with its predecessors ChatGPT-3.5 (OpenAI) and ChatGPT-4 (legacy) (OpenAI).

Methods

The authors used diagnostic challenges retrieved from the literature using 2 different approaches to elucidate o1-preview’s capacity to produce plausible differential diagnoses (DDs) and final diagnoses (FDs). The first approach instructed the chatbot to independently construct a DD before selecting a final diagnosis. The second approach instructed the chatbot to rely on DDs retrieved from the literature accompanying the diagnostic challenge. A 2-tailed t test was used to compare sample means, and a 2-tailed χ² test was used to compare sample proportions. A P value < .05 was considered statistically significant.

Results

The o1-preview model produced a plausible DD and a correct diagnosis for 94% and 80% of cases, respectively, when relying on an independent diagnostic approach, marking a significant increase from ChatGPT-3.5 (DD: difference, 32%; P = .001; FD: difference, 40%; P < .001) and ChatGPT-4 (legacy) (DD: difference, 18%; P = .012; FD: difference, 18%; P = .048). When relying on DDs retrieved from the literature, the model achieved a diagnostic accuracy of 86%, displaying a superior performance than its predecessors, although these results were not significant (ChatGPT-3.5: difference, 16%; P = .055; ChatGPT-4 (legacy): difference, 6%; P = .427).

Conclusions

Although further validation is required, the transformative findings of this investigation shift the discussion surrounding ChatGPT’s integration as a diagnostic tool to be not a question of if, but instead a matter of when.

Practical Implications

Although o1-preview has yet to achieve a proficient diagnostic accuracy, the model served well in generating DDs for complex cases.

查看原文本刊更多论文

借助OpenAI的o1预览版推进牙科诊断

o1-preview （OpenAI）的引入引发了围绕其在诊断复杂病例方面的潜在应用的讨论。作者测量了与其前身ChatGPT-3.5 （OpenAI）和ChatGPT-4 (legacy) （OpenAI）相比，01 -preview诊断复杂病例能力的变化。方法从文献中检索诊断挑战，采用两种不同的方法来阐明o1-preview产生可信鉴别诊断（dd）和最终诊断（fd）的能力。第一种方法是指示聊天机器人在选择最终诊断之前独立构建DD。第二种方法指示聊天机器人依赖从伴随诊断挑战的文献中检索到的dd。样本均值比较采用双尾t检验，样本比例比较采用双尾χ2检验。A P值<；0.05被认为具有统计学意义。结果1-预览模型在依赖独立诊断方法时，分别对94%和80%的病例产生了合理的DD和正确的诊断，与ChatGPT-3.5相比显著增加(DD：差异，32%；P = .001；FD：差值40%；P & lt;.001)和ChatGPT-4（遗留）(DD：差异，18%；P = 0.012；FD：差值18%；P = .048)。当依赖于从文献中检索到的dd时，该模型达到了86%的诊断准确率，表现出优于其前身的性能，尽管这些结果并不显著(ChatGPT-3.5：差异，16%；P = .055；ChatGPT-4（遗留）：差异6%；P = .427)。虽然需要进一步的验证，但这项调查的变革性发现使围绕ChatGPT作为诊断工具的集成的讨论不再是是否存在的问题，而是何时存在的问题。虽然01 -preview尚未达到熟练的诊断准确性，但该模型在生成复杂病例的dd方面表现良好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of the American Dental Association 医学-牙科与口腔外科

CiteScore

5.30

自引率

10.30%

发文量

221

审稿时长

34 days

期刊介绍： There is not a single source or solution to help dentists in their quest for lifelong learning, improving dental practice, and dental well-being. JADA+, along with The Journal of the American Dental Association, is striving to do just that, bringing together practical content covering dentistry topics and procedures to help dentists—both general dentists and specialists—provide better patient care and improve oral health and well-being. This is a work in progress; as we add more content, covering more topics of interest, it will continue to expand, becoming an ever-more essential source of oral health knowledge.