{"title":"使用可解释机器学习的睡眠呼吸暂停亚型的数据驱动表型和纵向特征建模","authors":"Shireen Fathima, Maaz Ahmed","doi":"10.1016/j.sleepe.2025.100113","DOIUrl":null,"url":null,"abstract":"<div><div>Sleep apnea is a heterogeneous disorder with distinct physiological mechanisms such as obstructive sleep apnea (OSA), central sleep apnea (CSA), and mixed forms, yet many subjects exhibit diagnostically ambiguous events that do not fit these categories. We define such cases as a novel Borderline (BL) Apnea phenotype, which our longitudinal analysis revealed to often behave as a transitional stage between normal breathing and pathological subtypes. Most machine learning (ML) studies adopt binary classification frameworks, overlooking phenotypic diversity, risk stratification, and longitudinal patterns. This study proposes a comprehensive framework integrating rule-based phenotyping, observational and statistical profiling, multiclass ML, and interpretable modeling to classify subjects from the Sleep Heart Health Study (SHHS) cohort into Normal, OSA, CSA, Both (mixed), or BL Apnea types using apnea–hypopnea index (AHI) thresholds: total (AHI<sub>A</sub>), obstructive (AHI<sub>O</sub>), and central (AHI<sub>C</sub>). The BL group captures individuals with elevated total AHI but subthreshold OSA and CSA components, representing a diagnostically ambiguous, underexplored phenotype. Demographic, anthropometric, and lifestyle traits were compared across subtypes to enable risk stratification . Dimensionality reduction (PCA, t-SNE) revealed substantial overlap, justifying non-linear modeling. Among nine classifiers, Gradient Boosting and LightGBM performed best (macro AUC <span><math><mrow><mo>></mo><mn>0</mn><mo>.</mo><mn>83</mn></mrow></math></span>, accuracy <span><math><mo>></mo></math></span>84%, specificity <span><math><mo>></mo></math></span>88%). SHAP interpretation consistently identified neck circumference, minimum O<sub>2</sub> saturation, Epworth Sleepiness Score, and arousal index as top predictors. Longitudinal analysis using SHHS Visit 2 showed heterogeneous outcomes for BL Apnea: 44% reverted to Normal and 22% progressed to ’Both’ type, highlighting its transitional nature and potential clinical utility for risk stratification, disease monitoring, and personalized management.</div></div>","PeriodicalId":74809,"journal":{"name":"Sleep epidemiology","volume":"5 ","pages":"Article 100113"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data driven phenotyping and longitudinal feature modeling of sleep apnea subtypes using interpretable machine learning\",\"authors\":\"Shireen Fathima, Maaz Ahmed\",\"doi\":\"10.1016/j.sleepe.2025.100113\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Sleep apnea is a heterogeneous disorder with distinct physiological mechanisms such as obstructive sleep apnea (OSA), central sleep apnea (CSA), and mixed forms, yet many subjects exhibit diagnostically ambiguous events that do not fit these categories. We define such cases as a novel Borderline (BL) Apnea phenotype, which our longitudinal analysis revealed to often behave as a transitional stage between normal breathing and pathological subtypes. Most machine learning (ML) studies adopt binary classification frameworks, overlooking phenotypic diversity, risk stratification, and longitudinal patterns. This study proposes a comprehensive framework integrating rule-based phenotyping, observational and statistical profiling, multiclass ML, and interpretable modeling to classify subjects from the Sleep Heart Health Study (SHHS) cohort into Normal, OSA, CSA, Both (mixed), or BL Apnea types using apnea–hypopnea index (AHI) thresholds: total (AHI<sub>A</sub>), obstructive (AHI<sub>O</sub>), and central (AHI<sub>C</sub>). The BL group captures individuals with elevated total AHI but subthreshold OSA and CSA components, representing a diagnostically ambiguous, underexplored phenotype. Demographic, anthropometric, and lifestyle traits were compared across subtypes to enable risk stratification . Dimensionality reduction (PCA, t-SNE) revealed substantial overlap, justifying non-linear modeling. Among nine classifiers, Gradient Boosting and LightGBM performed best (macro AUC <span><math><mrow><mo>></mo><mn>0</mn><mo>.</mo><mn>83</mn></mrow></math></span>, accuracy <span><math><mo>></mo></math></span>84%, specificity <span><math><mo>></mo></math></span>88%). SHAP interpretation consistently identified neck circumference, minimum O<sub>2</sub> saturation, Epworth Sleepiness Score, and arousal index as top predictors. Longitudinal analysis using SHHS Visit 2 showed heterogeneous outcomes for BL Apnea: 44% reverted to Normal and 22% progressed to ’Both’ type, highlighting its transitional nature and potential clinical utility for risk stratification, disease monitoring, and personalized management.</div></div>\",\"PeriodicalId\":74809,\"journal\":{\"name\":\"Sleep epidemiology\",\"volume\":\"5 \",\"pages\":\"Article 100113\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sleep epidemiology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2667343625000083\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sleep epidemiology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667343625000083","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Data driven phenotyping and longitudinal feature modeling of sleep apnea subtypes using interpretable machine learning
Sleep apnea is a heterogeneous disorder with distinct physiological mechanisms such as obstructive sleep apnea (OSA), central sleep apnea (CSA), and mixed forms, yet many subjects exhibit diagnostically ambiguous events that do not fit these categories. We define such cases as a novel Borderline (BL) Apnea phenotype, which our longitudinal analysis revealed to often behave as a transitional stage between normal breathing and pathological subtypes. Most machine learning (ML) studies adopt binary classification frameworks, overlooking phenotypic diversity, risk stratification, and longitudinal patterns. This study proposes a comprehensive framework integrating rule-based phenotyping, observational and statistical profiling, multiclass ML, and interpretable modeling to classify subjects from the Sleep Heart Health Study (SHHS) cohort into Normal, OSA, CSA, Both (mixed), or BL Apnea types using apnea–hypopnea index (AHI) thresholds: total (AHIA), obstructive (AHIO), and central (AHIC). The BL group captures individuals with elevated total AHI but subthreshold OSA and CSA components, representing a diagnostically ambiguous, underexplored phenotype. Demographic, anthropometric, and lifestyle traits were compared across subtypes to enable risk stratification . Dimensionality reduction (PCA, t-SNE) revealed substantial overlap, justifying non-linear modeling. Among nine classifiers, Gradient Boosting and LightGBM performed best (macro AUC , accuracy 84%, specificity 88%). SHAP interpretation consistently identified neck circumference, minimum O2 saturation, Epworth Sleepiness Score, and arousal index as top predictors. Longitudinal analysis using SHHS Visit 2 showed heterogeneous outcomes for BL Apnea: 44% reverted to Normal and 22% progressed to ’Both’ type, highlighting its transitional nature and potential clinical utility for risk stratification, disease monitoring, and personalized management.