Direct Quantification of Uncertainty in Deep Learning-Based Automatic Sleep Staging.

IF 4.5 2区医学 Q2 ENGINEERING, BIOMEDICAL

IEEE Transactions on Biomedical Engineering Pub Date : 2025-10-20 DOI:10.1109/TBME.2025.3623380

Miika Vainikka, Riku Huttunen, Samu Kainulainen, Henri Korkalainen, Matias Rusanen

{"title":"Direct Quantification of Uncertainty in Deep Learning-Based Automatic Sleep Staging.","authors":"Miika Vainikka, Riku Huttunen, Samu Kainulainen, Henri Korkalainen, Matias Rusanen","doi":"10.1109/TBME.2025.3623380","DOIUrl":null,"url":null,"abstract":"Objective: To evaluate and compare different methods for quantifying uncertainty in deep learning-based automatic sleep staging, thereby enhancing transparency and supporting clinical adoption.Methods: Three models trained on the STAGES dataset were analyzed. One model used a traditional hypnodensity threshold to assess uncertainty. Two additional models employed Monte Carlo (MC) dropout with different dropout architectures to sample hypnodensities. For these, uncertainty was quantified using thresholds on the mean hypnodensity and the standard deviation. A novel Hypnodensity Interval (HI) method was also introduced, combining the sample mean and standard deviation for uncertainty assessment. All methods were evaluated on the independent DOD dataset by measuring performance improvements after removing uncertain epochs.Results: All models achieved at least 83% accuracy in sleep staging. Among the MC dropout models, the mean hypnodensity threshold yielded the greatest performance improvement (92% accuracy with 20% of the most uncertain data removed), while the standard deviation threshold was less effective. The HI method performed comparably to traditional hypnodensity thresholding and effectively identified uncertainty, particularly in misclassifications between N2 and N3 sleep stages.Conclusion: Uncertainty in automatic sleep staging can be reliably quantified directly from hypnodensity outputs. The HI method offers a viable alternative to existing approaches, providing both justifiable thresholding and greater flexibility.Significance: These findings support the clinical integration of automatic sleep staging systems by improving decision-making transparency and enabling targeted manual review of uncertain epochs without sacrificing overall accuracy.","PeriodicalId":13245,"journal":{"name":"IEEE Transactions on Biomedical Engineering","volume":"PP ","pages":""},"PeriodicalIF":4.5000,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Biomedical Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/TBME.2025.3623380","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: To evaluate and compare different methods for quantifying uncertainty in deep learning-based automatic sleep staging, thereby enhancing transparency and supporting clinical adoption.

Methods: Three models trained on the STAGES dataset were analyzed. One model used a traditional hypnodensity threshold to assess uncertainty. Two additional models employed Monte Carlo (MC) dropout with different dropout architectures to sample hypnodensities. For these, uncertainty was quantified using thresholds on the mean hypnodensity and the standard deviation. A novel Hypnodensity Interval (HI) method was also introduced, combining the sample mean and standard deviation for uncertainty assessment. All methods were evaluated on the independent DOD dataset by measuring performance improvements after removing uncertain epochs.

Results: All models achieved at least 83% accuracy in sleep staging. Among the MC dropout models, the mean hypnodensity threshold yielded the greatest performance improvement (92% accuracy with 20% of the most uncertain data removed), while the standard deviation threshold was less effective. The HI method performed comparably to traditional hypnodensity thresholding and effectively identified uncertainty, particularly in misclassifications between N2 and N3 sleep stages.

Conclusion: Uncertainty in automatic sleep staging can be reliably quantified directly from hypnodensity outputs. The HI method offers a viable alternative to existing approaches, providing both justifiable thresholding and greater flexibility.

Significance: These findings support the clinical integration of automatic sleep staging systems by improving decision-making transparency and enabling targeted manual review of uncertain epochs without sacrificing overall accuracy.

查看原文本刊更多论文

基于深度学习的自动睡眠分期的不确定性直接量化。

目的：评估和比较基于深度学习的自动睡眠分期不确定性量化的不同方法，从而提高透明度并支持临床应用。方法：对在STAGES数据集上训练的三个模型进行分析。一个模型使用传统的催眠密度阈值来评估不确定性。另外两个模型采用蒙特卡罗（MC） dropout，采用不同的dropout架构来采样催眠密度。对于这些，使用平均催眠密度和标准偏差的阈值来量化不确定性。本文还介绍了一种新的催眠密度区间（HI）方法，该方法将样本均值和标准差结合起来进行不确定度评估。所有方法都在独立的DOD数据集上进行评估，通过测量去除不确定时代后的性能改进。结果：所有模型在睡眠分期方面都达到了至少83%的准确率。在MC辍学模型中，平均催眠密度阈值产生了最大的性能改进（92%的准确率，去除20%的最不确定数据），而标准偏差阈值效果较差。HI方法的表现与传统的催眠密度阈值法相当，并有效地识别了不确定性，特别是在N2和N3睡眠阶段之间的错误分类。结论：自动睡眠分期的不确定性可由催眠密度输出直接可靠地量化。HI方法为现有方法提供了一种可行的替代方案，提供了合理的阈值和更大的灵活性。意义：这些发现通过提高决策透明度和在不牺牲整体准确性的情况下对不确定时期进行有针对性的人工审查，支持了自动睡眠分期系统的临床整合。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Biomedical Engineering 工程技术-工程：生物医学

CiteScore

9.40

自引率

4.30%

发文量

880

审稿时长

2.5 months

期刊介绍： IEEE Transactions on Biomedical Engineering contains basic and applied papers dealing with biomedical engineering. Papers range from engineering development in methods and techniques with biomedical applications to experimental and clinical investigations with engineering contributions.