{"title":"Comparing the performances of a fifty-four-year-old computer-based consultation to ChatGPT-4o.","authors":"Elvan Burak Verdi, Oguz Akbilgic","doi":"10.1055/a-2628-8408","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To evaluate and compare the diagnostic responses generated by two artificial intelligence models developed 54 years apart and to encourage physicians to explore the use of large language models (LLMs) like GPT-4o in clinical practice.</p><p><strong>Methods: </strong>A clinical case of metabolic acidosis was presented to GPT-4o, and the model's diagnostic reasoning, data interpretation, and management recommendations were recorded. These outputs were then compared to the responses from Schwartz's 1970 AI model built with a decision-tree algorithm using Conversational Algebraic Language (CAL). Both models were given the same patient data to ensure a fair comparison.</p><p><strong>Results: </strong>GPT-4o generated an advanced analysis of the patient's acid-base disturbance, correctly identifying likely causes and suggesting relevant diagnostic tests and treatments. It provided a detailed, narrative explanation of the metabolic acidosis. The 1970 CAL model, while correctly recognizing the metabolic acidosis and flagging implausible inputs, was constrained by its rule-based design. CAL offered only basic stepwise guidance and required sequential prompts for each data point, reflecting a limited capacity to handle complex or unanticipated information. GPT-4o, by contrast, integrated the data more holistically, although it occasionally ventured beyond the provided information.</p><p><strong>Conclusion: </strong>This comparison illustrates substantial advances in AI capabilities over five decades. GPT-4o's performance demonstrates the transformative potential of modern LLMs in clinical decision-making, showcasing abilities to synthesize complex data and assist diagnosis without specialized training, yet necessitating further validation, rigorous clinical trials, and adaptation to clinical contexts. Although innovative for its era and offering certain advantages over GPT-4o, the rule-based CAL system had technical limitations. Rather than viewing one as simply \"better,\" this study provides perspective on how far AI in medicine has progressed while acknowledging that current AI tools remain supplements to-not replacements for-physician judgment.</p>","PeriodicalId":48956,"journal":{"name":"Applied Clinical Informatics","volume":" ","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Clinical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1055/a-2628-8408","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: To evaluate and compare the diagnostic responses generated by two artificial intelligence models developed 54 years apart and to encourage physicians to explore the use of large language models (LLMs) like GPT-4o in clinical practice.
Methods: A clinical case of metabolic acidosis was presented to GPT-4o, and the model's diagnostic reasoning, data interpretation, and management recommendations were recorded. These outputs were then compared to the responses from Schwartz's 1970 AI model built with a decision-tree algorithm using Conversational Algebraic Language (CAL). Both models were given the same patient data to ensure a fair comparison.
Results: GPT-4o generated an advanced analysis of the patient's acid-base disturbance, correctly identifying likely causes and suggesting relevant diagnostic tests and treatments. It provided a detailed, narrative explanation of the metabolic acidosis. The 1970 CAL model, while correctly recognizing the metabolic acidosis and flagging implausible inputs, was constrained by its rule-based design. CAL offered only basic stepwise guidance and required sequential prompts for each data point, reflecting a limited capacity to handle complex or unanticipated information. GPT-4o, by contrast, integrated the data more holistically, although it occasionally ventured beyond the provided information.
Conclusion: This comparison illustrates substantial advances in AI capabilities over five decades. GPT-4o's performance demonstrates the transformative potential of modern LLMs in clinical decision-making, showcasing abilities to synthesize complex data and assist diagnosis without specialized training, yet necessitating further validation, rigorous clinical trials, and adaptation to clinical contexts. Although innovative for its era and offering certain advantages over GPT-4o, the rule-based CAL system had technical limitations. Rather than viewing one as simply "better," this study provides perspective on how far AI in medicine has progressed while acknowledging that current AI tools remain supplements to-not replacements for-physician judgment.
期刊介绍:
ACI is the third Schattauer journal dealing with biomedical and health informatics. It perfectly complements our other journals Öffnet internen Link im aktuellen FensterMethods of Information in Medicine and the Öffnet internen Link im aktuellen FensterYearbook of Medical Informatics. The Yearbook of Medical Informatics being the “Milestone” or state-of-the-art journal and Methods of Information in Medicine being the “Science and Research” journal of IMIA, ACI intends to be the “Practical” journal of IMIA.