Generalizability of electroencephalographic interpretation using artificial intelligence: An external validation study

IF 6.6 1区医学 Q1 CLINICAL NEUROLOGY

Epilepsia Pub Date : 2024-08-14 DOI:10.1111/epi.18082

Daniel Mansilla, Jesper Tveit, Harald Aurlien, Tamir Avigdor, Victoria Ros-Castello, Alyssa Ho, Chifaou Abdallah, Jean Gotman, Sándor Beniczky, Birgit Frauscher

{"title":"Generalizability of electroencephalographic interpretation using artificial intelligence: An external validation study","authors":"Daniel Mansilla, Jesper Tveit, Harald Aurlien, Tamir Avigdor, Victoria Ros-Castello, Alyssa Ho, Chifaou Abdallah, Jean Gotman, Sándor Beniczky, Birgit Frauscher","doi":"10.1111/epi.18082","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Objective</h3>\n \n <p>The automated interpretation of clinical electroencephalograms (EEGs) using artificial intelligence (AI) holds the potential to bridge the treatment gap in resource-limited settings and reduce the workload at specialized centers. However, to facilitate broad clinical implementation, it is essential to establish generalizability across diverse patient populations and equipment. We assessed whether SCORE-AI demonstrates diagnostic accuracy comparable to that of experts when applied to a geographically different patient population, recorded with distinct EEG equipment and technical settings.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>We assessed the diagnostic accuracy of a “fixed-and-frozen” AI model, using an independent dataset and external gold standard, and benchmarked it against three experts blinded to all other data. The dataset comprised 50% normal and 50% abnormal routine EEGs, equally distributed among the four major classes of EEG abnormalities (focal epileptiform, generalized epileptiform, focal nonepileptiform, and diffuse nonepileptiform). To assess diagnostic accuracy, we computed sensitivity, specificity, and accuracy of the AI model and the experts against the external gold standard.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>We analyzed EEGs from 104 patients (64 females, median age = 38.6 [range = 16–91] years). SCORE-AI performed equally well compared to the experts, with an overall accuracy of 92% (95% confidence interval [CI] = 90%–94%) versus 94% (95% CI = 92%–96%). There was no significant difference between SCORE-AI and the experts for any metric or category. SCORE-AI performed well independently of the vigilance state (false classification during awake: 5/41 [12.2%], false classification during sleep: 2/11 [18.2%]; <i>p</i> = .63) and normal variants (false classification in presence of normal variants: 4/14 [28.6%], false classification in absence of normal variants: 3/38 [7.9%]; <i>p</i> = .07).</p>\n </section>\n \n <section>\n \n <h3> Significance</h3>\n \n <p>SCORE-AI achieved diagnostic performance equal to human experts in an EEG dataset independent of the development dataset, in a geographically distinct patient population, recorded with different equipment and technical settings than the development dataset.</p>\n </section>\n </div>","PeriodicalId":11768,"journal":{"name":"Epilepsia","volume":"65 10","pages":"3028-3037"},"PeriodicalIF":6.6000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/epi.18082","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epilepsia","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/epi.18082","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objective

The automated interpretation of clinical electroencephalograms (EEGs) using artificial intelligence (AI) holds the potential to bridge the treatment gap in resource-limited settings and reduce the workload at specialized centers. However, to facilitate broad clinical implementation, it is essential to establish generalizability across diverse patient populations and equipment. We assessed whether SCORE-AI demonstrates diagnostic accuracy comparable to that of experts when applied to a geographically different patient population, recorded with distinct EEG equipment and technical settings.

Methods

We assessed the diagnostic accuracy of a “fixed-and-frozen” AI model, using an independent dataset and external gold standard, and benchmarked it against three experts blinded to all other data. The dataset comprised 50% normal and 50% abnormal routine EEGs, equally distributed among the four major classes of EEG abnormalities (focal epileptiform, generalized epileptiform, focal nonepileptiform, and diffuse nonepileptiform). To assess diagnostic accuracy, we computed sensitivity, specificity, and accuracy of the AI model and the experts against the external gold standard.

Results

We analyzed EEGs from 104 patients (64 females, median age = 38.6 [range = 16–91] years). SCORE-AI performed equally well compared to the experts, with an overall accuracy of 92% (95% confidence interval [CI] = 90%–94%) versus 94% (95% CI = 92%–96%). There was no significant difference between SCORE-AI and the experts for any metric or category. SCORE-AI performed well independently of the vigilance state (false classification during awake: 5/41 [12.2%], false classification during sleep: 2/11 [18.2%]; p = .63) and normal variants (false classification in presence of normal variants: 4/14 [28.6%], false classification in absence of normal variants: 3/38 [7.9%]; p = .07).

Significance

SCORE-AI achieved diagnostic performance equal to human experts in an EEG dataset independent of the development dataset, in a geographically distinct patient population, recorded with different equipment and technical settings than the development dataset.

Abstract Image

查看原文本刊更多论文

人工智能脑电图解读的通用性：外部验证研究。

目的：利用人工智能（AI）对临床脑电图（EEG）进行自动解读，有可能缩小资源有限环境下的治疗差距，并减轻专科中心的工作量。然而，为了促进广泛的临床应用，必须在不同的患者群体和设备之间建立通用性。我们评估了当 SCORE-AI 应用于不同地理位置的患者群体，并使用不同的脑电图设备和技术设置进行记录时，其诊断准确性是否可与专家相媲美：我们使用独立的数据集和外部金标准评估了 "固定和冷冻 "人工智能模型的诊断准确性，并将其与对所有其他数据保密的三位专家进行比较。数据集包括 50% 的正常常规脑电图和 50% 的异常常规脑电图，在四大类脑电图异常（局灶性癫痫样、泛发性癫痫样、局灶性非癫痫样和弥漫性非癫痫样）中平均分配。为了评估诊断的准确性，我们对照外部金标准计算了人工智能模型和专家的敏感性、特异性和准确性：我们分析了 104 名患者（64 名女性，中位年龄 = 38.6 [范围 = 16-91] 岁）的脑电图。与专家相比，SCORE-AI 的表现同样出色，总体准确率为 92%（95% 置信区间 [CI] = 90%-94%），而专家为 94%（95% 置信区间 = 92%-96%）。在任何指标或类别上，SCORE-AI 与专家之间均无明显差异。SCORE-AI 在独立于警觉状态（清醒时错误分类：5/41 [12.2%]，睡眠时错误分类：2/11 [18.2%]；p = .63）和正常变异（存在正常变异时错误分类：4/14 [28.6%]，睡眠时错误分类：2/11 [18.2%]；p = .63）的情况下表现良好：4/14[28.6%]，无正常变体时的错误分类：3/38[7.9%]；P = .07）：SCORE-AI在一个独立于开发数据集的脑电图数据集上实现了与人类专家同等的诊断性能，该数据集的患者群体地理位置不同，使用的设备和技术设置也与开发数据集不同。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Epilepsia 医学-临床神经学

CiteScore

10.90

自引率

10.70%

发文量

319

审稿时长

2-4 weeks

期刊介绍： Epilepsia is the leading, authoritative source for innovative clinical and basic science research for all aspects of epilepsy and seizures. In addition, Epilepsia publishes critical reviews, opinion pieces, and guidelines that foster understanding and aim to improve the diagnosis and treatment of people with seizures and epilepsy.