Wan-Yee Kong, Fábio A Nascimento, Aaron Struck, Erik Duhaime, Srishti Kapur, Edilberto Amorim, Gregory Kapinos, Andres Rodriguez, Brendan Thomas, Masoom Desai, Jong Woo Lee, M Brandon Westover, Jin Jing
{"title":"Evaluating crowdsourcing for ICU EEG annotation: A comparison with expert performance.","authors":"Wan-Yee Kong, Fábio A Nascimento, Aaron Struck, Erik Duhaime, Srishti Kapur, Edilberto Amorim, Gregory Kapinos, Andres Rodriguez, Brendan Thomas, Masoom Desai, Jong Woo Lee, M Brandon Westover, Jin Jing","doi":"10.1111/epi.18547","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Detection of seizures and rhythmic or periodic patterns (SRPPs) on electroencephalography (EEG) is crucial for the diagnosis and management of patients with neurological critical illness. Although automated detection methods are advancing, they require large, high-quality, expert-annotated datasets for training. However, expert annotation is limited by the availability of trained neurophysiologists. Crowdsourcing, or soliciting contributions from a large group of people, may present a potential solution. This study evaluates the feasibility of crowdsourcing annotations of short epochs of EEG recordings by comparing the performance of experts and non-experts in identifying six SRPPs.</p><p><strong>Methods: </strong>We conducted an EEG scoring contest using a mobile app, involving expert and non-expert participants. Non-experts in our studies include physicians-MD, medical students-MS, nurse-practitioner-NP/physician-assistant-PA/pharmacists, NP-students/PA-students/pharmacy-students/other-healthcare-students, and others. Performance was assessed using pairwise agreement and Fleiss' kappa between experts, and accuracy comparisons between experts and the crowd using individual and weighted majority votes.</p><p><strong>Results: </strong>A total of 1542 participants (8 experts and 1534 non-experts) answered 478 834 questions across six SRPPs: seizures, generalized and lateralized periodic discharges (GPDs and LPDs), and generalized and lateralized rhythmic delta activity (GRDA LRDA), and \"Other.\" Using individual, non-weighted votes, the crowd's performance was inferior to experts for overall and across six SRPP identification. Using weighted majority votes, the crowd was non-inferior to experts for overall SRPP identification with accuracy of .70, 95% confidence interval [CI]: .69-.70 compared to expert's accuracy of .68, 95% CI: .68-.70. The crowd performed comparably or better than experts in identifying most SRPPs, except for LPDs and \"Other\". No individual expert outperformed the crowd on overall metrics.</p><p><strong>Significance: </strong>This proof-of-concept highlights the promise of crowd reviewers for obtaining expert-level annotations of SRPPs, which could potentially accelerate the development of large, diverse datasets for training automated detection algorithms. Challenges, such as varying calibration/test splits across crowd participants in the study and the absence of gold standard labels in the real-world settings, remain to be addressed.</p>","PeriodicalId":11768,"journal":{"name":"Epilepsia","volume":" ","pages":""},"PeriodicalIF":6.6000,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epilepsia","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/epi.18547","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: Detection of seizures and rhythmic or periodic patterns (SRPPs) on electroencephalography (EEG) is crucial for the diagnosis and management of patients with neurological critical illness. Although automated detection methods are advancing, they require large, high-quality, expert-annotated datasets for training. However, expert annotation is limited by the availability of trained neurophysiologists. Crowdsourcing, or soliciting contributions from a large group of people, may present a potential solution. This study evaluates the feasibility of crowdsourcing annotations of short epochs of EEG recordings by comparing the performance of experts and non-experts in identifying six SRPPs.
Methods: We conducted an EEG scoring contest using a mobile app, involving expert and non-expert participants. Non-experts in our studies include physicians-MD, medical students-MS, nurse-practitioner-NP/physician-assistant-PA/pharmacists, NP-students/PA-students/pharmacy-students/other-healthcare-students, and others. Performance was assessed using pairwise agreement and Fleiss' kappa between experts, and accuracy comparisons between experts and the crowd using individual and weighted majority votes.
Results: A total of 1542 participants (8 experts and 1534 non-experts) answered 478 834 questions across six SRPPs: seizures, generalized and lateralized periodic discharges (GPDs and LPDs), and generalized and lateralized rhythmic delta activity (GRDA LRDA), and "Other." Using individual, non-weighted votes, the crowd's performance was inferior to experts for overall and across six SRPP identification. Using weighted majority votes, the crowd was non-inferior to experts for overall SRPP identification with accuracy of .70, 95% confidence interval [CI]: .69-.70 compared to expert's accuracy of .68, 95% CI: .68-.70. The crowd performed comparably or better than experts in identifying most SRPPs, except for LPDs and "Other". No individual expert outperformed the crowd on overall metrics.
Significance: This proof-of-concept highlights the promise of crowd reviewers for obtaining expert-level annotations of SRPPs, which could potentially accelerate the development of large, diverse datasets for training automated detection algorithms. Challenges, such as varying calibration/test splits across crowd participants in the study and the absence of gold standard labels in the real-world settings, remain to be addressed.
期刊介绍:
Epilepsia is the leading, authoritative source for innovative clinical and basic science research for all aspects of epilepsy and seizures. In addition, Epilepsia publishes critical reviews, opinion pieces, and guidelines that foster understanding and aim to improve the diagnosis and treatment of people with seizures and epilepsy.