Gerald H Lushington, Sandeep Nair, Eldon R Jupe, Bernard Rubin, Mohan Purushothaman
{"title":"Criteria and Protocol: Assessing Generative AI Efficacy in Perceiving EULAR 2019 Lupus Classification.","authors":"Gerald H Lushington, Sandeep Nair, Eldon R Jupe, Bernard Rubin, Mohan Purushothaman","doi":"10.3390/diagnostics15182409","DOIUrl":null,"url":null,"abstract":"<p><p><b>Background/Objectives:</b> In clinical informatics, the term 'information overload' is increasingly used to describe the operational impediments of excessive documentation. While electronic health records (EHRs) are growing in abundance, many medical records (MRs) remain in legacy formats that impede efficient, systematic processing, contributing to the extenuating challenges of care fragmentation. Thus, there is a growing interest in using generative AI (genAI) for automated MR summarization and characterization. <b>Methods:</b> MRs for a set of 78 individuals were digitized. Some were known systemic lupus erythematosus (SLE) cases, while others were under evaluation for possible SLE classification. A two-pass genAI assessment strategy was implemented using the Claude 3.5 large language model (LLM) to mine MRs for information relevant to classifying SLE vs. undifferentiated connective tissue disorder (UCTD) vs. neither via the 22-criteria EULAR 2019 model. <b>Results:</b> Compared to clinical determination, the antinuclear antibody (ANA) criterion (whose results are crucial for classifying SLE-negative cases) exhibited favorable sensitivity 0.78 ± 0.09 (95% confidence interval) and a positive predictive value 0.85 ± 0.08 but a marginal performance for specificity 0.60 ± 0.11 and uncertain predictivity for the negative predictive value 0.48 ± 0.11. Averaged over the remaining 21 criteria, these four performance metrics were 0.69 ± 0.11, 0.87 ± 0.04, 0.54 ± 0.10, and 0.93 ± 0.03. <b>Conclusions:</b> ANA performance statistics imply that genAI yields confident assessments of SLE negativity (per high sensitivity) but weaker positivity. The remaining genAI criterial determinations support (per specificity) confident assertions of SLE-positivity but tend to misclassify a significant fraction of clinical positives as UCTD.</p>","PeriodicalId":11225,"journal":{"name":"Diagnostics","volume":"15 18","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12468409/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3390/diagnostics15182409","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background/Objectives: In clinical informatics, the term 'information overload' is increasingly used to describe the operational impediments of excessive documentation. While electronic health records (EHRs) are growing in abundance, many medical records (MRs) remain in legacy formats that impede efficient, systematic processing, contributing to the extenuating challenges of care fragmentation. Thus, there is a growing interest in using generative AI (genAI) for automated MR summarization and characterization. Methods: MRs for a set of 78 individuals were digitized. Some were known systemic lupus erythematosus (SLE) cases, while others were under evaluation for possible SLE classification. A two-pass genAI assessment strategy was implemented using the Claude 3.5 large language model (LLM) to mine MRs for information relevant to classifying SLE vs. undifferentiated connective tissue disorder (UCTD) vs. neither via the 22-criteria EULAR 2019 model. Results: Compared to clinical determination, the antinuclear antibody (ANA) criterion (whose results are crucial for classifying SLE-negative cases) exhibited favorable sensitivity 0.78 ± 0.09 (95% confidence interval) and a positive predictive value 0.85 ± 0.08 but a marginal performance for specificity 0.60 ± 0.11 and uncertain predictivity for the negative predictive value 0.48 ± 0.11. Averaged over the remaining 21 criteria, these four performance metrics were 0.69 ± 0.11, 0.87 ± 0.04, 0.54 ± 0.10, and 0.93 ± 0.03. Conclusions: ANA performance statistics imply that genAI yields confident assessments of SLE negativity (per high sensitivity) but weaker positivity. The remaining genAI criterial determinations support (per specificity) confident assertions of SLE-positivity but tend to misclassify a significant fraction of clinical positives as UCTD.
DiagnosticsBiochemistry, Genetics and Molecular Biology-Clinical Biochemistry
CiteScore
4.70
自引率
8.30%
发文量
2699
审稿时长
19.64 days
期刊介绍:
Diagnostics (ISSN 2075-4418) is an international scholarly open access journal on medical diagnostics. It publishes original research articles, reviews, communications and short notes on the research and development of medical diagnostics. There is no restriction on the length of the papers. Our aim is to encourage scientists to publish their experimental and theoretical research in as much detail as possible. Full experimental and/or methodological details must be provided for research articles.