使用可解释机器学习的睡眠呼吸暂停亚型的数据驱动表型和纵向特征建模

Sleep epidemiology Pub Date : 2025-09-09 DOI:10.1016/j.sleepe.2025.100113

Shireen Fathima, Maaz Ahmed

{"title":"使用可解释机器学习的睡眠呼吸暂停亚型的数据驱动表型和纵向特征建模","authors":"Shireen Fathima, Maaz Ahmed","doi":"10.1016/j.sleepe.2025.100113","DOIUrl":null,"url":null,"abstract":"<div><div>Sleep apnea is a heterogeneous disorder with distinct physiological mechanisms such as obstructive sleep apnea (OSA), central sleep apnea (CSA), and mixed forms, yet many subjects exhibit diagnostically ambiguous events that do not fit these categories. We define such cases as a novel Borderline (BL) Apnea phenotype, which our longitudinal analysis revealed to often behave as a transitional stage between normal breathing and pathological subtypes. Most machine learning (ML) studies adopt binary classification frameworks, overlooking phenotypic diversity, risk stratification, and longitudinal patterns. This study proposes a comprehensive framework integrating rule-based phenotyping, observational and statistical profiling, multiclass ML, and interpretable modeling to classify subjects from the Sleep Heart Health Study (SHHS) cohort into Normal, OSA, CSA, Both (mixed), or BL Apnea types using apnea–hypopnea index (AHI) thresholds: total (AHI<sub>A</sub>), obstructive (AHI<sub>O</sub>), and central (AHI<sub>C</sub>). The BL group captures individuals with elevated total AHI but subthreshold OSA and CSA components, representing a diagnostically ambiguous, underexplored phenotype. Demographic, anthropometric, and lifestyle traits were compared across subtypes to enable risk stratification . Dimensionality reduction (PCA, t-SNE) revealed substantial overlap, justifying non-linear modeling. Among nine classifiers, Gradient Boosting and LightGBM performed best (macro AUC <span><math><mrow><mo>></mo><mn>0</mn><mo>.</mo><mn>83</mn></mrow></math></span>, accuracy <span><math><mo>></mo></math></span>84%, specificity <span><math><mo>></mo></math></span>88%). SHAP interpretation consistently identified neck circumference, minimum O<sub>2</sub> saturation, Epworth Sleepiness Score, and arousal index as top predictors. Longitudinal analysis using SHHS Visit 2 showed heterogeneous outcomes for BL Apnea: 44% reverted to Normal and 22% progressed to ’Both’ type, highlighting its transitional nature and potential clinical utility for risk stratification, disease monitoring, and personalized management.</div></div>","PeriodicalId":74809,"journal":{"name":"Sleep epidemiology","volume":"5 ","pages":"Article 100113"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data driven phenotyping and longitudinal feature modeling of sleep apnea subtypes using interpretable machine learning\",\"authors\":\"Shireen Fathima, Maaz Ahmed\",\"doi\":\"10.1016/j.sleepe.2025.100113\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Sleep apnea is a heterogeneous disorder with distinct physiological mechanisms such as obstructive sleep apnea (OSA), central sleep apnea (CSA), and mixed forms, yet many subjects exhibit diagnostically ambiguous events that do not fit these categories. We define such cases as a novel Borderline (BL) Apnea phenotype, which our longitudinal analysis revealed to often behave as a transitional stage between normal breathing and pathological subtypes. Most machine learning (ML) studies adopt binary classification frameworks, overlooking phenotypic diversity, risk stratification, and longitudinal patterns. This study proposes a comprehensive framework integrating rule-based phenotyping, observational and statistical profiling, multiclass ML, and interpretable modeling to classify subjects from the Sleep Heart Health Study (SHHS) cohort into Normal, OSA, CSA, Both (mixed), or BL Apnea types using apnea–hypopnea index (AHI) thresholds: total (AHI<sub>A</sub>), obstructive (AHI<sub>O</sub>), and central (AHI<sub>C</sub>). The BL group captures individuals with elevated total AHI but subthreshold OSA and CSA components, representing a diagnostically ambiguous, underexplored phenotype. Demographic, anthropometric, and lifestyle traits were compared across subtypes to enable risk stratification . Dimensionality reduction (PCA, t-SNE) revealed substantial overlap, justifying non-linear modeling. Among nine classifiers, Gradient Boosting and LightGBM performed best (macro AUC <span><math><mrow><mo>></mo><mn>0</mn><mo>.</mo><mn>83</mn></mrow></math></span>, accuracy <span><math><mo>></mo></math></span>84%, specificity <span><math><mo>></mo></math></span>88%). SHAP interpretation consistently identified neck circumference, minimum O<sub>2</sub> saturation, Epworth Sleepiness Score, and arousal index as top predictors. Longitudinal analysis using SHHS Visit 2 showed heterogeneous outcomes for BL Apnea: 44% reverted to Normal and 22% progressed to ’Both’ type, highlighting its transitional nature and potential clinical utility for risk stratification, disease monitoring, and personalized management.</div></div>\",\"PeriodicalId\":74809,\"journal\":{\"name\":\"Sleep epidemiology\",\"volume\":\"5 \",\"pages\":\"Article 100113\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sleep epidemiology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2667343625000083\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sleep epidemiology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667343625000083","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

睡眠呼吸暂停是一种异质性疾病，具有不同的生理机制，如阻塞性睡眠呼吸暂停（OSA）、中枢性睡眠呼吸暂停（CSA）和混合形式，但许多受试者表现出诊断上模棱两可的事件，不符合这些类别。我们将这些病例定义为一种新的边缘性（BL）呼吸暂停表型，我们的纵向分析显示，这种表型通常表现为正常呼吸和病理亚型之间的过渡阶段。大多数机器学习（ML）研究采用二元分类框架，忽略了表型多样性、风险分层和纵向模式。本研究提出了一个综合的框架，将基于规则的表型、观察和统计分析、多类别ML和可解释模型整合在一起，使用呼吸暂停-低通气指数（AHI）阈值将睡眠心脏健康研究（SHHS）队列中的受试者分为正常、OSA、CSA、两者（混合）或BL呼吸暂停类型：total （AHIA）、obstructive （AHIO）和central （AHIC）。BL组捕获总AHI升高但阈下OSA和CSA成分的个体，代表诊断模糊，未充分探索的表型。对不同亚型的人口统计学、人体测量学和生活方式特征进行比较，以便进行风险分层。降维（PCA, t-SNE）显示了大量的重叠，证明了非线性建模的合理性。在9个分类器中，Gradient Boosting和LightGBM表现最好（宏观AUC >；0.83，准确率>；84%，特异性>；88%）。SHAP解释一致认为颈围、最低氧饱和度、Epworth嗜睡评分和觉醒指数是最重要的预测因素。使用SHHS Visit 2进行的纵向分析显示，BL呼吸暂停的结果存在异质性：44%恢复为正常，22%进展为“两种”类型，突出了其过渡性和潜在的临床应用，包括风险分层、疾病监测和个性化管理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Data driven phenotyping and longitudinal feature modeling of sleep apnea subtypes using interpretable machine learning

Sleep apnea is a heterogeneous disorder with distinct physiological mechanisms such as obstructive sleep apnea (OSA), central sleep apnea (CSA), and mixed forms, yet many subjects exhibit diagnostically ambiguous events that do not fit these categories. We define such cases as a novel Borderline (BL) Apnea phenotype, which our longitudinal analysis revealed to often behave as a transitional stage between normal breathing and pathological subtypes. Most machine learning (ML) studies adopt binary classification frameworks, overlooking phenotypic diversity, risk stratification, and longitudinal patterns. This study proposes a comprehensive framework integrating rule-based phenotyping, observational and statistical profiling, multiclass ML, and interpretable modeling to classify subjects from the Sleep Heart Health Study (SHHS) cohort into Normal, OSA, CSA, Both (mixed), or BL Apnea types using apnea–hypopnea index (AHI) thresholds: total (AHI_A), obstructive (AHI_O), and central (AHI_C). The BL group captures individuals with elevated total AHI but subthreshold OSA and CSA components, representing a diagnostically ambiguous, underexplored phenotype. Demographic, anthropometric, and lifestyle traits were compared across subtypes to enable risk stratification . Dimensionality reduction (PCA, t-SNE) revealed substantial overlap, justifying non-linear modeling. Among nine classifiers, Gradient Boosting and LightGBM performed best (macro AUC

> 0.83

, accuracy

>

84%, specificity

>

88%). SHAP interpretation consistently identified neck circumference, minimum O₂ saturation, Epworth Sleepiness Score, and arousal index as top predictors. Longitudinal analysis using SHHS Visit 2 showed heterogeneous outcomes for BL Apnea: 44% reverted to Normal and 22% progressed to ’Both’ type, highlighting its transitional nature and potential clinical utility for risk stratification, disease monitoring, and personalized management.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Sleep epidemiology Dentistry, Oral Surgery and Medicine, Clinical Neurology, Pulmonary and Respiratory Medicine

CiteScore

1.80

自引率

0.00%

发文量