Xingyuan Li, Ke Liu, Yanlin Lang, Zhonglin Chai, Fang Liu
{"title":"Exploring the Potential of Claude 3 Opus in Renal Pathological Diagnosis: Performance Evaluation.","authors":"Xingyuan Li, Ke Liu, Yanlin Lang, Zhonglin Chai, Fang Liu","doi":"10.2196/65033","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence (AI) has shown great promise in assisting medical diagnosis, but its application in renal pathology remains limited.</p><p><strong>Objective: </strong>We evaluated the performance of an advanced AI language model, Claude 3 Opus (Anthropic), in generating diagnostic descriptions for renal pathological images.</p><p><strong>Methods: </strong>We carefully curated a dataset of 100 representative renal pathological images from the Diagnostic Atlas of Renal Pathology (3rd edition). The image selection aimed to cover a wide spectrum of common renal diseases, ensuring a balanced and comprehensive dataset. Claude 3 Opus generated diagnostic descriptions for each image, which were scored by 2 pathologists on clinical relevance, accuracy, fluency, completeness, and overall value.</p><p><strong>Results: </strong>Claude 3 Opus achieved a high mean score in language fluency (3.86) but lower scores in clinical relevance (1.75), accuracy (1.55), completeness (2.01), and overall value (1.75). Performance varied across disease types. Interrater agreement was substantial for relevance (κ=0.627) and overall value (κ=0.589) and moderate for accuracy (κ=0.485) and completeness (κ=0.458).</p><p><strong>Conclusions: </strong>Claude 3 Opus shows potential in generating fluent renal pathology descriptions but needs improvement in accuracy and clinical value. The AI's performance varied across disease types. Addressing the limitations of single-source data and incorporating comparative analyses with other AI approaches are essential steps for future research. Further optimization and validation are needed for clinical applications.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e65033"},"PeriodicalIF":3.1000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/65033","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Artificial intelligence (AI) has shown great promise in assisting medical diagnosis, but its application in renal pathology remains limited.
Objective: We evaluated the performance of an advanced AI language model, Claude 3 Opus (Anthropic), in generating diagnostic descriptions for renal pathological images.
Methods: We carefully curated a dataset of 100 representative renal pathological images from the Diagnostic Atlas of Renal Pathology (3rd edition). The image selection aimed to cover a wide spectrum of common renal diseases, ensuring a balanced and comprehensive dataset. Claude 3 Opus generated diagnostic descriptions for each image, which were scored by 2 pathologists on clinical relevance, accuracy, fluency, completeness, and overall value.
Results: Claude 3 Opus achieved a high mean score in language fluency (3.86) but lower scores in clinical relevance (1.75), accuracy (1.55), completeness (2.01), and overall value (1.75). Performance varied across disease types. Interrater agreement was substantial for relevance (κ=0.627) and overall value (κ=0.589) and moderate for accuracy (κ=0.485) and completeness (κ=0.458).
Conclusions: Claude 3 Opus shows potential in generating fluent renal pathology descriptions but needs improvement in accuracy and clinical value. The AI's performance varied across disease types. Addressing the limitations of single-source data and incorporating comparative analyses with other AI approaches are essential steps for future research. Further optimization and validation are needed for clinical applications.
背景:人工智能(AI)在辅助医疗诊断方面前景广阔,但在肾脏病理学方面的应用仍然有限:我们评估了高级人工智能语言模型 Claude 3 Opus(Anthropic)在生成肾脏病理图像诊断描述方面的性能:我们从《肾脏病理诊断图谱》(第 3 版)中精心挑选了 100 幅具有代表性的肾脏病理图像数据集。图像的选择旨在涵盖广泛的常见肾脏疾病,确保数据集的均衡性和全面性。Claude 3 Opus 为每张图像生成诊断描述,由两名病理学家根据临床相关性、准确性、流畅性、完整性和整体价值进行评分:Claude 3 Opus 在语言流畅性方面获得了较高的平均分(3.86),但在临床相关性(1.75)、准确性(1.55)、完整性(2.01)和总体价值(1.75)方面得分较低。不同疾病类型的评分结果各不相同。在相关性(κ=0.627)和总体价值(κ=0.589)方面,相互之间的一致性很高,在准确性(κ=0.485)和完整性(κ=0.458)方面,相互之间的一致性处于中等水平:Claude 3 Opus 在生成流畅的肾脏病理描述方面显示出潜力,但在准确性和临床价值方面有待提高。人工智能在不同疾病类型中的表现各不相同。解决单一来源数据的局限性并结合与其他人工智能方法的比较分析是未来研究的关键步骤。临床应用还需要进一步优化和验证。
期刊介绍:
JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals.
Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.