Michal Bechny, Giuliana Monachino, Luigi Fiorillo, Julia van der Meer, Markus H Schmidt, Claudio LA Bassetti, Athina Tzovara, Francesca D Faraci
{"title":"Bridging AI and Clinical Practice: Integrating Automated Sleep Scoring Algorithm with Uncertainty-Guided Physician Review","authors":"Michal Bechny, Giuliana Monachino, Luigi Fiorillo, Julia van der Meer, Markus H Schmidt, Claudio LA Bassetti, Athina Tzovara, Francesca D Faraci","doi":"10.2147/nss.s455649","DOIUrl":null,"url":null,"abstract":"<strong>Purpose:</strong> This study aims to enhance the clinical use of automated sleep-scoring algorithms by incorporating an uncertainty estimation approach to efficiently assist clinicians in the manual review of predicted hypnograms, a necessity due to the notable inter-scorer variability inherent in polysomnography (PSG) databases. Our efforts target the extent of review required to achieve predefined agreement levels, examining both in-domain (ID) and out-of-domain (OOD) data, and considering subjects’ diagnoses.<br/><strong>Patients and Methods:</strong> A total of 19,578 PSGs from 13 open-access databases were used to train U-Sleep, a state-of-the-art sleep-scoring algorithm. We leveraged a comprehensive clinical database of an additional 8832 PSGs, covering a full spectrum of ages (0– 91 years) and sleep-disorders, to refine the U-Sleep, and to evaluate different uncertainty-quantification approaches, including our novel confidence network. The ID data consisted of PSGs scored by over 50 physicians, and the two OOD sets comprised recordings each scored by a unique senior physician.<br/><strong>Results:</strong> U-Sleep demonstrated robust performance, with Cohen’s kappa (K) at 76.2% on ID and 73.8– 78.8% on OOD data. The confidence network excelled at identifying uncertain predictions, achieving AUROC scores of 85.7% on ID and 82.5– 85.6% on OOD data. Independently of sleep-disorder status, statistical evaluations revealed significant differences in confidence scores between aligning vs discording predictions, and significant correlations of confidence scores with classification performance metrics. To achieve κ ≥ 90% with physician intervention, examining less than 29.0% of uncertain epochs was required, substantially reducing physicians’ workload, and facilitating near-perfect agreement.<br/><strong>Conclusion:</strong> Inter-scorer variability limits the accuracy of the scoring algorithms to ~80%. By integrating an uncertainty estimation with U-Sleep, we enhance the review of predicted hypnograms, to align with the scoring taste of a responsible physician. Validated across ID and OOD data and various sleep-disorders, our approach offers a strategy to boost automated scoring tools’ usability in clinical settings.<br/><br/><strong>Keywords:</strong> automated sleep scoring, uncertainty quantification, explainable AI, polysomnography, sleep medicine<br/>","PeriodicalId":18896,"journal":{"name":"Nature and Science of Sleep","volume":null,"pages":null},"PeriodicalIF":3.0000,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature and Science of Sleep","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2147/nss.s455649","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: This study aims to enhance the clinical use of automated sleep-scoring algorithms by incorporating an uncertainty estimation approach to efficiently assist clinicians in the manual review of predicted hypnograms, a necessity due to the notable inter-scorer variability inherent in polysomnography (PSG) databases. Our efforts target the extent of review required to achieve predefined agreement levels, examining both in-domain (ID) and out-of-domain (OOD) data, and considering subjects’ diagnoses. Patients and Methods: A total of 19,578 PSGs from 13 open-access databases were used to train U-Sleep, a state-of-the-art sleep-scoring algorithm. We leveraged a comprehensive clinical database of an additional 8832 PSGs, covering a full spectrum of ages (0– 91 years) and sleep-disorders, to refine the U-Sleep, and to evaluate different uncertainty-quantification approaches, including our novel confidence network. The ID data consisted of PSGs scored by over 50 physicians, and the two OOD sets comprised recordings each scored by a unique senior physician. Results: U-Sleep demonstrated robust performance, with Cohen’s kappa (K) at 76.2% on ID and 73.8– 78.8% on OOD data. The confidence network excelled at identifying uncertain predictions, achieving AUROC scores of 85.7% on ID and 82.5– 85.6% on OOD data. Independently of sleep-disorder status, statistical evaluations revealed significant differences in confidence scores between aligning vs discording predictions, and significant correlations of confidence scores with classification performance metrics. To achieve κ ≥ 90% with physician intervention, examining less than 29.0% of uncertain epochs was required, substantially reducing physicians’ workload, and facilitating near-perfect agreement. Conclusion: Inter-scorer variability limits the accuracy of the scoring algorithms to ~80%. By integrating an uncertainty estimation with U-Sleep, we enhance the review of predicted hypnograms, to align with the scoring taste of a responsible physician. Validated across ID and OOD data and various sleep-disorders, our approach offers a strategy to boost automated scoring tools’ usability in clinical settings.
期刊介绍:
Nature and Science of Sleep is an international, peer-reviewed, open access journal covering all aspects of sleep science and sleep medicine, including the neurophysiology and functions of sleep, the genetics of sleep, sleep and society, biological rhythms, dreaming, sleep disorders and therapy, and strategies to optimize healthy sleep.
Specific topics covered in the journal include:
The functions of sleep in humans and other animals
Physiological and neurophysiological changes with sleep
The genetics of sleep and sleep differences
The neurotransmitters, receptors and pathways involved in controlling both sleep and wakefulness
Behavioral and pharmacological interventions aimed at improving sleep, and improving wakefulness
Sleep changes with development and with age
Sleep and reproduction (e.g., changes across the menstrual cycle, with pregnancy and menopause)
The science and nature of dreams
Sleep disorders
Impact of sleep and sleep disorders on health, daytime function and quality of life
Sleep problems secondary to clinical disorders
Interaction of society with sleep (e.g., consequences of shift work, occupational health, public health)
The microbiome and sleep
Chronotherapy
Impact of circadian rhythms on sleep, physiology, cognition and health
Mechanisms controlling circadian rhythms, centrally and peripherally
Impact of circadian rhythm disruptions (including night shift work, jet lag and social jet lag) on sleep, physiology, cognition and health
Behavioral and pharmacological interventions aimed at reducing adverse effects of circadian-related sleep disruption
Assessment of technologies and biomarkers for measuring sleep and/or circadian rhythms
Epigenetic markers of sleep or circadian disruption.