OpenAI 的新型 o1 模型能否在常见的眼科护理查询中胜过其前辈?

IF 3.2 Q1 OPHTHALMOLOGY
Krithi Pushpanathan MSc , Minjie Zou MMed , Sahana Srinivasan BEng , Wendy Meihua Wong MMed , Erlangga Ariadarma Mangunkusumo MD , George Naveen Thomas MMed , Yien Lai MMed , Chen-Hsin Sun MD , Janice Sing Harn Lam MMed , Marcus Chun Jin Tan MMed , Hazel Anne Hui'En Lin MMed , Weizhi Ma PhD , Victor Teck Chang Koh MMed , David Ziyou Chen MMed , Yih-Chung Tham PhD
{"title":"OpenAI 的新型 o1 模型能否在常见的眼科护理查询中胜过其前辈?","authors":"Krithi Pushpanathan MSc ,&nbsp;Minjie Zou MMed ,&nbsp;Sahana Srinivasan BEng ,&nbsp;Wendy Meihua Wong MMed ,&nbsp;Erlangga Ariadarma Mangunkusumo MD ,&nbsp;George Naveen Thomas MMed ,&nbsp;Yien Lai MMed ,&nbsp;Chen-Hsin Sun MD ,&nbsp;Janice Sing Harn Lam MMed ,&nbsp;Marcus Chun Jin Tan MMed ,&nbsp;Hazel Anne Hui'En Lin MMed ,&nbsp;Weizhi Ma PhD ,&nbsp;Victor Teck Chang Koh MMed ,&nbsp;David Ziyou Chen MMed ,&nbsp;Yih-Chung Tham PhD","doi":"10.1016/j.xops.2025.100745","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>The newly launched OpenAI o1 is said to offer improved reasoning, potentially providing higher quality responses to eye care queries. However, its performance remains unassessed. We evaluated the performance of o1, ChatGPT-4o, and ChatGPT-4 in addressing ophthalmic-related queries, focusing on correctness, completeness, and readability.</div></div><div><h3>Design</h3><div>Cross-sectional study.</div></div><div><h3>Subjects</h3><div>Sixteen queries, previously identified as suboptimally responded to by ChatGPT-4 from prior studies, were used, covering 3 subtopics: myopia (6 questions), ocular symptoms (4 questions), and retinal conditions (6 questions).</div></div><div><h3>Methods</h3><div>For each subtopic, 3 attending-level ophthalmologists, masked to the model sources, evaluated the responses based on correctness, completeness, and readability (on a 5-point scale for each metric).</div></div><div><h3>Main Outcome Measures</h3><div>Mean summed scores of each model for correctness, completeness, and readability, rated on a 5-point scale (maximum score: 15).</div></div><div><h3>Results</h3><div>O1 scored highest in correctness (12.6) and readability (14.2), outperforming ChatGPT-4, which scored 10.3 (<em>P</em> = 0.010) and 12.4 (<em>P</em> &lt; 0.001), respectively. No significant difference was found between o1 and ChatGPT-4o. When stratified by subtopics, o1 consistently demonstrated superior correctness and readability. In completeness, ChatGPT-4o achieved the highest score of 12.4, followed by o1 (10.8), though the difference was not statistically significant. o1 showed notable limitations in completeness for ocular symptom queries, scoring 5.5 out of 15.</div></div><div><h3>Conclusions</h3><div>While o1 is marketed as offering improved reasoning capabilities, its performance in addressing eye care queries does not significantly differ from its predecessor, ChatGPT-4o. Nevertheless, it surpasses ChatGPT-4, particularly in correctness and readability.</div></div><div><h3>Financial Disclosure(s)</h3><div>Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.</div></div>","PeriodicalId":74363,"journal":{"name":"Ophthalmology science","volume":"5 4","pages":"Article 100745"},"PeriodicalIF":3.2000,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Can OpenAI's New o1 Model Outperform Its Predecessors in Common Eye Care Queries?\",\"authors\":\"Krithi Pushpanathan MSc ,&nbsp;Minjie Zou MMed ,&nbsp;Sahana Srinivasan BEng ,&nbsp;Wendy Meihua Wong MMed ,&nbsp;Erlangga Ariadarma Mangunkusumo MD ,&nbsp;George Naveen Thomas MMed ,&nbsp;Yien Lai MMed ,&nbsp;Chen-Hsin Sun MD ,&nbsp;Janice Sing Harn Lam MMed ,&nbsp;Marcus Chun Jin Tan MMed ,&nbsp;Hazel Anne Hui'En Lin MMed ,&nbsp;Weizhi Ma PhD ,&nbsp;Victor Teck Chang Koh MMed ,&nbsp;David Ziyou Chen MMed ,&nbsp;Yih-Chung Tham PhD\",\"doi\":\"10.1016/j.xops.2025.100745\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objective</h3><div>The newly launched OpenAI o1 is said to offer improved reasoning, potentially providing higher quality responses to eye care queries. However, its performance remains unassessed. We evaluated the performance of o1, ChatGPT-4o, and ChatGPT-4 in addressing ophthalmic-related queries, focusing on correctness, completeness, and readability.</div></div><div><h3>Design</h3><div>Cross-sectional study.</div></div><div><h3>Subjects</h3><div>Sixteen queries, previously identified as suboptimally responded to by ChatGPT-4 from prior studies, were used, covering 3 subtopics: myopia (6 questions), ocular symptoms (4 questions), and retinal conditions (6 questions).</div></div><div><h3>Methods</h3><div>For each subtopic, 3 attending-level ophthalmologists, masked to the model sources, evaluated the responses based on correctness, completeness, and readability (on a 5-point scale for each metric).</div></div><div><h3>Main Outcome Measures</h3><div>Mean summed scores of each model for correctness, completeness, and readability, rated on a 5-point scale (maximum score: 15).</div></div><div><h3>Results</h3><div>O1 scored highest in correctness (12.6) and readability (14.2), outperforming ChatGPT-4, which scored 10.3 (<em>P</em> = 0.010) and 12.4 (<em>P</em> &lt; 0.001), respectively. No significant difference was found between o1 and ChatGPT-4o. When stratified by subtopics, o1 consistently demonstrated superior correctness and readability. In completeness, ChatGPT-4o achieved the highest score of 12.4, followed by o1 (10.8), though the difference was not statistically significant. o1 showed notable limitations in completeness for ocular symptom queries, scoring 5.5 out of 15.</div></div><div><h3>Conclusions</h3><div>While o1 is marketed as offering improved reasoning capabilities, its performance in addressing eye care queries does not significantly differ from its predecessor, ChatGPT-4o. Nevertheless, it surpasses ChatGPT-4, particularly in correctness and readability.</div></div><div><h3>Financial Disclosure(s)</h3><div>Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.</div></div>\",\"PeriodicalId\":74363,\"journal\":{\"name\":\"Ophthalmology science\",\"volume\":\"5 4\",\"pages\":\"Article 100745\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-02-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ophthalmology science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666914525000430\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"OPHTHALMOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ophthalmology science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666914525000430","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

本文章由计算机程序翻译,如有差异,请以英文原文为准。
Can OpenAI's New o1 Model Outperform Its Predecessors in Common Eye Care Queries?

Objective

The newly launched OpenAI o1 is said to offer improved reasoning, potentially providing higher quality responses to eye care queries. However, its performance remains unassessed. We evaluated the performance of o1, ChatGPT-4o, and ChatGPT-4 in addressing ophthalmic-related queries, focusing on correctness, completeness, and readability.

Design

Cross-sectional study.

Subjects

Sixteen queries, previously identified as suboptimally responded to by ChatGPT-4 from prior studies, were used, covering 3 subtopics: myopia (6 questions), ocular symptoms (4 questions), and retinal conditions (6 questions).

Methods

For each subtopic, 3 attending-level ophthalmologists, masked to the model sources, evaluated the responses based on correctness, completeness, and readability (on a 5-point scale for each metric).

Main Outcome Measures

Mean summed scores of each model for correctness, completeness, and readability, rated on a 5-point scale (maximum score: 15).

Results

O1 scored highest in correctness (12.6) and readability (14.2), outperforming ChatGPT-4, which scored 10.3 (P = 0.010) and 12.4 (P < 0.001), respectively. No significant difference was found between o1 and ChatGPT-4o. When stratified by subtopics, o1 consistently demonstrated superior correctness and readability. In completeness, ChatGPT-4o achieved the highest score of 12.4, followed by o1 (10.8), though the difference was not statistically significant. o1 showed notable limitations in completeness for ocular symptom queries, scoring 5.5 out of 15.

Conclusions

While o1 is marketed as offering improved reasoning capabilities, its performance in addressing eye care queries does not significantly differ from its predecessor, ChatGPT-4o. Nevertheless, it surpasses ChatGPT-4, particularly in correctness and readability.

Financial Disclosure(s)

Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Ophthalmology science
Ophthalmology science Ophthalmology
CiteScore
3.40
自引率
0.00%
发文量
0
审稿时长
89 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信