Annemette L Moeller, Mathias Perslev, Cecilie Paulsrud, Steffen U Thorsen, Helle Leonthin, Nanette M Debes, Jannet Svensson, Poul Jennum
{"title":"Artificial intelligence or sleep experts: Comparing polysomnographic sleep staging in children and adolescents.","authors":"Annemette L Moeller, Mathias Perslev, Cecilie Paulsrud, Steffen U Thorsen, Helle Leonthin, Nanette M Debes, Jannet Svensson, Poul Jennum","doi":"10.1093/sleep/zsaf053","DOIUrl":null,"url":null,"abstract":"<p><strong>Study objectives: </strong>The manual annotation of polysomnography (PSG) hypnograms is difficult and time-consuming. U-Sleep is an alternative, fast and publicly available, automated sleep staging system evaluated in adult PSGs. In this study we compare the staging done by sleep experts and U-sleep in a pediatric sample.</p><p><strong>Methods: </strong>PSGs from 56 children aged 6-17 years old (healthy or with a chronic disease) were compared manually annotated with the result of U-sleep. The two outcomes were compared using F1 overlap scores, accuracy, Cohen's kappa, and correlation coefficients. A qualitative analysis of the most significant systematic differences between the manual and automated scoring was performed.</p><p><strong>Results: </strong>U-sleep matched the manually scored hypnograms with an overall mean F1 score (predicted performance) of 0.75 and reached an accuracy of 83.9% and an overall kappa value of 0.77. The stage-wise F1 scores, U-sleep achieved an F1 score of 0.79 in stage Wake, 0.40 in N1, 0.86 in N2, 0.84 in N3, and 0.86 in REM. The correlation between U-sleep and the manual scorer was moderately or very strong in all sleep stages (r = 0.57-0.81).</p><p><strong>Conclusions: </strong>Overall, there is a high degree of agreement between manual and automatic scoring. This suggests that U-sleep is a valid and effective method for identifying sleep stages based on normal PSGs in a pediatric population. The disagreement was within what is expected for interscorer variation. Further evaluation needs of AI sleep scoring models includes analysis of outliers and pathological sleep staging - which is also a challenge in manual annotation.</p>","PeriodicalId":22018,"journal":{"name":"Sleep","volume":" ","pages":""},"PeriodicalIF":5.6000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sleep","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/sleep/zsaf053","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0
Abstract
Study objectives: The manual annotation of polysomnography (PSG) hypnograms is difficult and time-consuming. U-Sleep is an alternative, fast and publicly available, automated sleep staging system evaluated in adult PSGs. In this study we compare the staging done by sleep experts and U-sleep in a pediatric sample.
Methods: PSGs from 56 children aged 6-17 years old (healthy or with a chronic disease) were compared manually annotated with the result of U-sleep. The two outcomes were compared using F1 overlap scores, accuracy, Cohen's kappa, and correlation coefficients. A qualitative analysis of the most significant systematic differences between the manual and automated scoring was performed.
Results: U-sleep matched the manually scored hypnograms with an overall mean F1 score (predicted performance) of 0.75 and reached an accuracy of 83.9% and an overall kappa value of 0.77. The stage-wise F1 scores, U-sleep achieved an F1 score of 0.79 in stage Wake, 0.40 in N1, 0.86 in N2, 0.84 in N3, and 0.86 in REM. The correlation between U-sleep and the manual scorer was moderately or very strong in all sleep stages (r = 0.57-0.81).
Conclusions: Overall, there is a high degree of agreement between manual and automatic scoring. This suggests that U-sleep is a valid and effective method for identifying sleep stages based on normal PSGs in a pediatric population. The disagreement was within what is expected for interscorer variation. Further evaluation needs of AI sleep scoring models includes analysis of outliers and pathological sleep staging - which is also a challenge in manual annotation.
期刊介绍:
SLEEP® publishes findings from studies conducted at any level of analysis, including:
Genes
Molecules
Cells
Physiology
Neural systems and circuits
Behavior and cognition
Self-report
SLEEP® publishes articles that use a wide variety of scientific approaches and address a broad range of topics. These may include, but are not limited to:
Basic and neuroscience studies of sleep and circadian mechanisms
In vitro and animal models of sleep, circadian rhythms, and human disorders
Pre-clinical human investigations, including the measurement and manipulation of sleep and circadian rhythms
Studies in clinical or population samples. These may address factors influencing sleep and circadian rhythms (e.g., development and aging, and social and environmental influences) and relationships between sleep, circadian rhythms, health, and disease
Clinical trials, epidemiology studies, implementation, and dissemination research.