人工智能模型在风湿病学委员会级问题中的性能比较：评估 Google Gemini 和 ChatGPT-4o。

IF 2.9 3区医学 Q2 RHEUMATOLOGY

Clinical Rheumatology Pub Date : 2024-11-01 Epub Date: 2024-09-28 DOI:10.1007/s10067-024-07154-5

Enes Efe Is, Ahmet Kivanc Menekseoglu

{"title":"人工智能模型在风湿病学委员会级问题中的性能比较：评估 Google Gemini 和 ChatGPT-4o。","authors":"Enes Efe Is, Ahmet Kivanc Menekseoglu","doi":"10.1007/s10067-024-07154-5","DOIUrl":null,"url":null,"abstract":"Objectives: This study evaluates the performance of AI models, ChatGPT-4o and Google Gemini, in answering rheumatology board-level questions, comparing their effectiveness, reliability, and applicability in clinical practice.Method: A cross-sectional study was conducted using 420 rheumatology questions from the BoardVitals question bank, excluding 27 visual data questions. Both artificial intelligence models categorized the questions according to difficulty (easy, medium, hard) and answered them. In addition, the reliability of the answers was assessed by asking the questions a second time. The accuracy, reliability, and difficulty categorization of the AI models' response to the questions were analyzed.Results: ChatGPT-4o answered 86.9% of the questions correctly, significantly outperforming Google Gemini's 60.2% accuracy (p < 0.001). When the questions were asked a second time, the success rate was 86.7% for ChatGPT-4o and 60.5% for Google Gemini. Both models mainly categorized questions as medium difficulty. ChatGPT-4o showed higher accuracy in various rheumatology subfields, notably in Basic and Clinical Science (p = 0.028), Osteoarthritis (p = 0.023), and Rheumatoid Arthritis (p < 0.001).Conclusions: ChatGPT-4o significantly outperformed Google Gemini in rheumatology board-level questions. This demonstrates the success of ChatGPT-4o in situations requiring complex and specialized knowledge related to rheumatological diseases. The performance of both AI models decreased as the question difficulty increased. This study demonstrates the potential of AI in clinical applications and suggests that its use as a tool to assist clinicians may improve healthcare efficiency in the future. Future studies using real clinical scenarios and real board questions are recommended. Key Points •ChatGPT-4o significantly outperformed Google Gemini in answering rheumatology board-level questions, achieving 86.9% accuracy compared to Google Gemini's 60.2%. •For both AI models, the correct answer rate decreased as the question difficulty increased. •The study demonstrates the potential for AI models to be used in clinical practice as a tool to assist clinicians and improve healthcare efficiency.","PeriodicalId":10482,"journal":{"name":"Clinical Rheumatology","volume":" ","pages":"3507-3513"},"PeriodicalIF":2.9000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o.\",\"authors\":\"Enes Efe Is, Ahmet Kivanc Menekseoglu\",\"doi\":\"10.1007/s10067-024-07154-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objectives: This study evaluates the performance of AI models, ChatGPT-4o and Google Gemini, in answering rheumatology board-level questions, comparing their effectiveness, reliability, and applicability in clinical practice.Method: A cross-sectional study was conducted using 420 rheumatology questions from the BoardVitals question bank, excluding 27 visual data questions. Both artificial intelligence models categorized the questions according to difficulty (easy, medium, hard) and answered them. In addition, the reliability of the answers was assessed by asking the questions a second time. The accuracy, reliability, and difficulty categorization of the AI models' response to the questions were analyzed.Results: ChatGPT-4o answered 86.9% of the questions correctly, significantly outperforming Google Gemini's 60.2% accuracy (p < 0.001). When the questions were asked a second time, the success rate was 86.7% for ChatGPT-4o and 60.5% for Google Gemini. Both models mainly categorized questions as medium difficulty. ChatGPT-4o showed higher accuracy in various rheumatology subfields, notably in Basic and Clinical Science (p = 0.028), Osteoarthritis (p = 0.023), and Rheumatoid Arthritis (p < 0.001).Conclusions: ChatGPT-4o significantly outperformed Google Gemini in rheumatology board-level questions. This demonstrates the success of ChatGPT-4o in situations requiring complex and specialized knowledge related to rheumatological diseases. The performance of both AI models decreased as the question difficulty increased. This study demonstrates the potential of AI in clinical applications and suggests that its use as a tool to assist clinicians may improve healthcare efficiency in the future. Future studies using real clinical scenarios and real board questions are recommended. Key Points •ChatGPT-4o significantly outperformed Google Gemini in answering rheumatology board-level questions, achieving 86.9% accuracy compared to Google Gemini's 60.2%. •For both AI models, the correct answer rate decreased as the question difficulty increased. •The study demonstrates the potential for AI models to be used in clinical practice as a tool to assist clinicians and improve healthcare efficiency.\",\"PeriodicalId\":10482,\"journal\":{\"name\":\"Clinical Rheumatology\",\"volume\":\" \",\"pages\":\"3507-3513\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical Rheumatology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s10067-024-07154-5\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/9/28 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"RHEUMATOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Rheumatology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s10067-024-07154-5","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/28 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"RHEUMATOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

研究目的本研究评估了 ChatGPT-4o 和谷歌双子座这两种人工智能模型在回答风湿病学委员会级问题时的表现，比较了它们在临床实践中的有效性、可靠性和适用性：这项横向研究使用了 BoardVitals 题库中的 420 道风湿病学问题，其中不包括 27 道可视化数据问题。两个人工智能模型根据难度（易、中、难）对问题进行了分类，并回答了问题。此外，还通过再次提问来评估答案的可靠性。对人工智能模型回答问题的准确性、可靠性和难度分类进行了分析：结果：ChatGPT-4o 正确回答了 86.9% 的问题，明显优于谷歌双子座 60.2% 的准确率（p 结论：ChatGPT-4o 的准确率为 86.9%，明显优于谷歌双子座的 60.2%）：ChatGPT-4o 在风湿病学委员会级别问题上的表现明显优于 Google Gemini。这表明 ChatGPT-4o 在需要风湿病相关的复杂和专业知识的情况下取得了成功。随着问题难度的增加，两种人工智能模型的性能都有所下降。这项研究展示了人工智能在临床应用中的潜力，并表明将其作为辅助临床医生的工具可能会提高未来的医疗效率。建议今后使用真实的临床场景和真实的董事会问题进行研究。研究要点 -ChatGPT-4o在回答风湿病学委员会级别的问题方面明显优于Google Gemini，准确率达到86.9%，而Google Gemini仅为60.2%。-对于这两种人工智能模型来说，随着问题难度的增加，正确答案率也在下降。-这项研究证明了人工智能模型在临床实践中作为辅助临床医生和提高医疗效率的工具的应用潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o.

Objectives: This study evaluates the performance of AI models, ChatGPT-4o and Google Gemini, in answering rheumatology board-level questions, comparing their effectiveness, reliability, and applicability in clinical practice.

Method: A cross-sectional study was conducted using 420 rheumatology questions from the BoardVitals question bank, excluding 27 visual data questions. Both artificial intelligence models categorized the questions according to difficulty (easy, medium, hard) and answered them. In addition, the reliability of the answers was assessed by asking the questions a second time. The accuracy, reliability, and difficulty categorization of the AI models' response to the questions were analyzed.

Results: ChatGPT-4o answered 86.9% of the questions correctly, significantly outperforming Google Gemini's 60.2% accuracy (p < 0.001). When the questions were asked a second time, the success rate was 86.7% for ChatGPT-4o and 60.5% for Google Gemini. Both models mainly categorized questions as medium difficulty. ChatGPT-4o showed higher accuracy in various rheumatology subfields, notably in Basic and Clinical Science (p = 0.028), Osteoarthritis (p = 0.023), and Rheumatoid Arthritis (p < 0.001).

Conclusions: ChatGPT-4o significantly outperformed Google Gemini in rheumatology board-level questions. This demonstrates the success of ChatGPT-4o in situations requiring complex and specialized knowledge related to rheumatological diseases. The performance of both AI models decreased as the question difficulty increased. This study demonstrates the potential of AI in clinical applications and suggests that its use as a tool to assist clinicians may improve healthcare efficiency in the future. Future studies using real clinical scenarios and real board questions are recommended. Key Points •ChatGPT-4o significantly outperformed Google Gemini in answering rheumatology board-level questions, achieving 86.9% accuracy compared to Google Gemini's 60.2%. •For both AI models, the correct answer rate decreased as the question difficulty increased. •The study demonstrates the potential for AI models to be used in clinical practice as a tool to assist clinicians and improve healthcare efficiency.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Clinical Rheumatology 医学-风湿病学

CiteScore

6.90

自引率

2.90%

发文量

441

审稿时长

3 months

期刊介绍： Clinical Rheumatology is an international English-language journal devoted to publishing original clinical investigation and research in the general field of rheumatology with accent on clinical aspects at postgraduate level. The journal succeeds Acta Rheumatologica Belgica, originally founded in 1945 as the official journal of the Belgian Rheumatology Society. Clinical Rheumatology aims to cover all modern trends in clinical and experimental research as well as the management and evaluation of diagnostic and treatment procedures connected with the inflammatory, immunologic, metabolic, genetic and degenerative soft and hard connective tissue diseases.