Umaer Hanif, Anis Aloulou, Flynn Crosbie, Paul Bouchequet, Mounir Chennaoui, Thomas Andrillon, Damien Leger
{"title":"解译失眠:复杂睡眠障碍的基准自动睡眠分期算法。","authors":"Umaer Hanif, Anis Aloulou, Flynn Crosbie, Paul Bouchequet, Mounir Chennaoui, Thomas Andrillon, Damien Leger","doi":"10.1111/jsr.70048","DOIUrl":null,"url":null,"abstract":"<p><p>Polysomnography (PSG) is essential for diagnosing sleep disorders, but its manual interpretation is labor-intensive. Automated sleep staging algorithms are promising, yet their utility in complex sleep disorders such as insomnia remains uncertain. This study evaluates five of the most recognised sleep staging classifiers-U-Sleep, STAGES, GSSC, Luna and YASA-on PSG data from 904 patients with chronic insomnia. Performance was assessed using F1 scores, confusion matrices and predicted sleep metrics. The effect of demographics, sleepiness and PSG metrics on each classifier's performance was assessed using linear regression. Across all sleep stages, GSSC performed best (macro F1 score = 0.66), followed by U-Sleep (0.62), Luna (0.56), STAGES (0.54) and YASA (0.52). GSSC achieved the highest F1 scores in Wake (0.83), N1 (0.22), N2 (0.80), N3 (0.71) and REM (0.76), while U-Sleep matched its performance in N1 and REM and Luna in N3. STAGES performed poorest in N3 (0.39) and YASA in REM (0.35). Common misclassifications included N1 vs. Wake/N2 and N3 vs. N2, with REM misclassified as Wake/N1/N2 by STAGES, Luna and YASA. GSSC and U-Sleep exhibited minimal demographic bias, while STAGES and Luna had more. No performance difference was observed between chronic insomnia patients with and without abnormal PSG. Sleep metric accuracy was highest for U-Sleep (TST, R<sup>2</sup> = 0.88), STAGES (SOL, R<sup>2</sup> = 0.82) and GSSC (WASO, R<sup>2</sup> = 0.82). These findings underscore the solid yet variable performance of the classifiers and highlight GSSC and U-Sleep as leading tools for sleep staging in patients with chronic insomnia.</p>","PeriodicalId":17057,"journal":{"name":"Journal of Sleep Research","volume":" ","pages":"e70048"},"PeriodicalIF":3.4000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deciphering Insomnia: Benchmarking Automated Sleep Staging Algorithms for Complex Sleep Disorders.\",\"authors\":\"Umaer Hanif, Anis Aloulou, Flynn Crosbie, Paul Bouchequet, Mounir Chennaoui, Thomas Andrillon, Damien Leger\",\"doi\":\"10.1111/jsr.70048\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Polysomnography (PSG) is essential for diagnosing sleep disorders, but its manual interpretation is labor-intensive. Automated sleep staging algorithms are promising, yet their utility in complex sleep disorders such as insomnia remains uncertain. This study evaluates five of the most recognised sleep staging classifiers-U-Sleep, STAGES, GSSC, Luna and YASA-on PSG data from 904 patients with chronic insomnia. Performance was assessed using F1 scores, confusion matrices and predicted sleep metrics. The effect of demographics, sleepiness and PSG metrics on each classifier's performance was assessed using linear regression. Across all sleep stages, GSSC performed best (macro F1 score = 0.66), followed by U-Sleep (0.62), Luna (0.56), STAGES (0.54) and YASA (0.52). GSSC achieved the highest F1 scores in Wake (0.83), N1 (0.22), N2 (0.80), N3 (0.71) and REM (0.76), while U-Sleep matched its performance in N1 and REM and Luna in N3. STAGES performed poorest in N3 (0.39) and YASA in REM (0.35). Common misclassifications included N1 vs. Wake/N2 and N3 vs. N2, with REM misclassified as Wake/N1/N2 by STAGES, Luna and YASA. GSSC and U-Sleep exhibited minimal demographic bias, while STAGES and Luna had more. No performance difference was observed between chronic insomnia patients with and without abnormal PSG. Sleep metric accuracy was highest for U-Sleep (TST, R<sup>2</sup> = 0.88), STAGES (SOL, R<sup>2</sup> = 0.82) and GSSC (WASO, R<sup>2</sup> = 0.82). These findings underscore the solid yet variable performance of the classifiers and highlight GSSC and U-Sleep as leading tools for sleep staging in patients with chronic insomnia.</p>\",\"PeriodicalId\":17057,\"journal\":{\"name\":\"Journal of Sleep Research\",\"volume\":\" \",\"pages\":\"e70048\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-03-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Sleep Research\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1111/jsr.70048\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CLINICAL NEUROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Sleep Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/jsr.70048","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
Polysomnography (PSG) is essential for diagnosing sleep disorders, but its manual interpretation is labor-intensive. Automated sleep staging algorithms are promising, yet their utility in complex sleep disorders such as insomnia remains uncertain. This study evaluates five of the most recognised sleep staging classifiers-U-Sleep, STAGES, GSSC, Luna and YASA-on PSG data from 904 patients with chronic insomnia. Performance was assessed using F1 scores, confusion matrices and predicted sleep metrics. The effect of demographics, sleepiness and PSG metrics on each classifier's performance was assessed using linear regression. Across all sleep stages, GSSC performed best (macro F1 score = 0.66), followed by U-Sleep (0.62), Luna (0.56), STAGES (0.54) and YASA (0.52). GSSC achieved the highest F1 scores in Wake (0.83), N1 (0.22), N2 (0.80), N3 (0.71) and REM (0.76), while U-Sleep matched its performance in N1 and REM and Luna in N3. STAGES performed poorest in N3 (0.39) and YASA in REM (0.35). Common misclassifications included N1 vs. Wake/N2 and N3 vs. N2, with REM misclassified as Wake/N1/N2 by STAGES, Luna and YASA. GSSC and U-Sleep exhibited minimal demographic bias, while STAGES and Luna had more. No performance difference was observed between chronic insomnia patients with and without abnormal PSG. Sleep metric accuracy was highest for U-Sleep (TST, R2 = 0.88), STAGES (SOL, R2 = 0.82) and GSSC (WASO, R2 = 0.82). These findings underscore the solid yet variable performance of the classifiers and highlight GSSC and U-Sleep as leading tools for sleep staging in patients with chronic insomnia.
期刊介绍:
The Journal of Sleep Research is dedicated to basic and clinical sleep research. The Journal publishes original research papers and invited reviews in all areas of sleep research (including biological rhythms). The Journal aims to promote the exchange of ideas between basic and clinical sleep researchers coming from a wide range of backgrounds and disciplines. The Journal will achieve this by publishing papers which use multidisciplinary and novel approaches to answer important questions about sleep, as well as its disorders and the treatment thereof.