Ajay Kevat, Rylan Steinkey, Sadasivam Suresh, Warren R Ruehland, Jasneek Chawla, Philip I Terrill, Andrew Collaro, Kartik Iyer
{"title":"Evaluation of automated pediatric sleep stage classification using U-Sleep: a convolutional neural network.","authors":"Ajay Kevat, Rylan Steinkey, Sadasivam Suresh, Warren R Ruehland, Jasneek Chawla, Philip I Terrill, Andrew Collaro, Kartik Iyer","doi":"10.5664/jcsm.11362","DOIUrl":null,"url":null,"abstract":"<p><strong>Study objectives: </strong>U-Sleep is a publicly available automated sleep stager, but has not been independently validated using pediatric data. We aimed to a) test the hypothesis that U-Sleep performance is equivalent to trained humans, using a concordance dataset of 50 pediatric polysomnogram excerpts scored by multiple trained scorers, and b) identify clinical and demographic characteristics that impact U-Sleep accuracy, using a clinical dataset of 3114 polysomnograms from a tertiary center.</p><p><strong>Methods: </strong>Agreement between U-Sleep and 'gold' 30-second epoch sleep staging was determined across both datasets. Utilizing the concordance dataset, the hypothesis of equivalence between human scorers and U-Sleep was tested using a Wilcoxon two one-sided test (TOST). Multivariable regression and generalized additive modelling were used on the clinical dataset to estimate the effects of age, comorbidities and polysomnographic findings on U-Sleep performance.</p><p><strong>Results: </strong>The median (interquartile range) Cohen's kappa agreement of U-Sleep and individual trained humans relative to \"gold\" scoring for 5-stage sleep staging in the concordance dataset were similar, kappa=0.79 (0.19) vs 0.78 (0.13) respectively, and satisfied statistical equivalence (TOST p < 0.01). Median (interquartile range) kappa agreement between U-Sleep 2.0 and clinical sleep-staging was kappa=0.69 (0.22). Modelling indicated lower performance for children < 2 years, those with medical comorbidities possibly altering sleep electroencephalography (kappa reduction=0.07-0.15) and those with decreased sleep efficiency or sleep-disordered breathing (kappa reduction=0.1).</p><p><strong>Conclusions: </strong>While U-Sleep algorithms showed statistically equivalent performance to trained scorers, accuracy was lower in children < 2 years and those with sleep-disordered breathing or comorbidities affecting electroencephalography. U-Sleep is suitable for pediatric clinical utilization provided automated staging is followed by expert clinician review.</p>","PeriodicalId":50233,"journal":{"name":"Journal of Clinical Sleep Medicine","volume":" ","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Sleep Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.5664/jcsm.11362","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Study objectives: U-Sleep is a publicly available automated sleep stager, but has not been independently validated using pediatric data. We aimed to a) test the hypothesis that U-Sleep performance is equivalent to trained humans, using a concordance dataset of 50 pediatric polysomnogram excerpts scored by multiple trained scorers, and b) identify clinical and demographic characteristics that impact U-Sleep accuracy, using a clinical dataset of 3114 polysomnograms from a tertiary center.
Methods: Agreement between U-Sleep and 'gold' 30-second epoch sleep staging was determined across both datasets. Utilizing the concordance dataset, the hypothesis of equivalence between human scorers and U-Sleep was tested using a Wilcoxon two one-sided test (TOST). Multivariable regression and generalized additive modelling were used on the clinical dataset to estimate the effects of age, comorbidities and polysomnographic findings on U-Sleep performance.
Results: The median (interquartile range) Cohen's kappa agreement of U-Sleep and individual trained humans relative to "gold" scoring for 5-stage sleep staging in the concordance dataset were similar, kappa=0.79 (0.19) vs 0.78 (0.13) respectively, and satisfied statistical equivalence (TOST p < 0.01). Median (interquartile range) kappa agreement between U-Sleep 2.0 and clinical sleep-staging was kappa=0.69 (0.22). Modelling indicated lower performance for children < 2 years, those with medical comorbidities possibly altering sleep electroencephalography (kappa reduction=0.07-0.15) and those with decreased sleep efficiency or sleep-disordered breathing (kappa reduction=0.1).
Conclusions: While U-Sleep algorithms showed statistically equivalent performance to trained scorers, accuracy was lower in children < 2 years and those with sleep-disordered breathing or comorbidities affecting electroencephalography. U-Sleep is suitable for pediatric clinical utilization provided automated staging is followed by expert clinician review.
期刊介绍:
Journal of Clinical Sleep Medicine focuses on clinical sleep medicine. Its emphasis is publication of papers with direct applicability and/or relevance to the clinical practice of sleep medicine. This includes clinical trials, clinical reviews, clinical commentary and debate, medical economic/practice perspectives, case series and novel/interesting case reports. In addition, the journal will publish proceedings from conferences, workshops and symposia sponsored by the American Academy of Sleep Medicine or other organizations related to improving the practice of sleep medicine.