Miika Vainikka, Riku Huttunen, Samu Kainulainen, Henri Korkalainen, Matias Rusanen
{"title":"Direct Quantification of Uncertainty in Deep Learning-Based Automatic Sleep Staging.","authors":"Miika Vainikka, Riku Huttunen, Samu Kainulainen, Henri Korkalainen, Matias Rusanen","doi":"10.1109/TBME.2025.3623380","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To evaluate and compare different methods for quantifying uncertainty in deep learning-based automatic sleep staging, thereby enhancing transparency and supporting clinical adoption.</p><p><strong>Methods: </strong>Three models trained on the STAGES dataset were analyzed. One model used a traditional hypnodensity threshold to assess uncertainty. Two additional models employed Monte Carlo (MC) dropout with different dropout architectures to sample hypnodensities. For these, uncertainty was quantified using thresholds on the mean hypnodensity and the standard deviation. A novel Hypnodensity Interval (HI) method was also introduced, combining the sample mean and standard deviation for uncertainty assessment. All methods were evaluated on the independent DOD dataset by measuring performance improvements after removing uncertain epochs.</p><p><strong>Results: </strong>All models achieved at least 83% accuracy in sleep staging. Among the MC dropout models, the mean hypnodensity threshold yielded the greatest performance improvement (92% accuracy with 20% of the most uncertain data removed), while the standard deviation threshold was less effective. The HI method performed comparably to traditional hypnodensity thresholding and effectively identified uncertainty, particularly in misclassifications between N2 and N3 sleep stages.</p><p><strong>Conclusion: </strong>Uncertainty in automatic sleep staging can be reliably quantified directly from hypnodensity outputs. The HI method offers a viable alternative to existing approaches, providing both justifiable thresholding and greater flexibility.</p><p><strong>Significance: </strong>These findings support the clinical integration of automatic sleep staging systems by improving decision-making transparency and enabling targeted manual review of uncertain epochs without sacrificing overall accuracy.</p>","PeriodicalId":13245,"journal":{"name":"IEEE Transactions on Biomedical Engineering","volume":"PP ","pages":""},"PeriodicalIF":4.5000,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Biomedical Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/TBME.2025.3623380","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: To evaluate and compare different methods for quantifying uncertainty in deep learning-based automatic sleep staging, thereby enhancing transparency and supporting clinical adoption.
Methods: Three models trained on the STAGES dataset were analyzed. One model used a traditional hypnodensity threshold to assess uncertainty. Two additional models employed Monte Carlo (MC) dropout with different dropout architectures to sample hypnodensities. For these, uncertainty was quantified using thresholds on the mean hypnodensity and the standard deviation. A novel Hypnodensity Interval (HI) method was also introduced, combining the sample mean and standard deviation for uncertainty assessment. All methods were evaluated on the independent DOD dataset by measuring performance improvements after removing uncertain epochs.
Results: All models achieved at least 83% accuracy in sleep staging. Among the MC dropout models, the mean hypnodensity threshold yielded the greatest performance improvement (92% accuracy with 20% of the most uncertain data removed), while the standard deviation threshold was less effective. The HI method performed comparably to traditional hypnodensity thresholding and effectively identified uncertainty, particularly in misclassifications between N2 and N3 sleep stages.
Conclusion: Uncertainty in automatic sleep staging can be reliably quantified directly from hypnodensity outputs. The HI method offers a viable alternative to existing approaches, providing both justifiable thresholding and greater flexibility.
Significance: These findings support the clinical integration of automatic sleep staging systems by improving decision-making transparency and enabling targeted manual review of uncertain epochs without sacrificing overall accuracy.
期刊介绍:
IEEE Transactions on Biomedical Engineering contains basic and applied papers dealing with biomedical engineering. Papers range from engineering development in methods and techniques with biomedical applications to experimental and clinical investigations with engineering contributions.