Matthew A Shew, Cole Pavelchek, Andrew Michelson, Amanda Ortmann, Shannon Lefler, Amit Walia, Nedim Durakovic, Alisa Phillips, Ayna Rejepova, Jacques A Herzog, Phillip Payne, Jay F Piccirillo, Craig A Buchman
{"title":"机器学习在人工耳蜗语音感知结果中的可行性——超越人工耳蜗性能预测的单一生物标志物。","authors":"Matthew A Shew, Cole Pavelchek, Andrew Michelson, Amanda Ortmann, Shannon Lefler, Amit Walia, Nedim Durakovic, Alisa Phillips, Ayna Rejepova, Jacques A Herzog, Phillip Payne, Jay F Piccirillo, Craig A Buchman","doi":"10.1097/AUD.0000000000001664","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>Machine learning (ML) is an emerging discipline centered around complex pattern matching and large data-based prediction modeling and can improve precision medicine healthcare. Cochlear implants (CI) are highly effective, however, outcomes vary widely, and accurately predicting speech perception performance outcomes between patients remains a challenge. This study aims to evaluate the ability of ML to predict speech perception performance among CI recipients at 6-month post-implantation using only preoperative variables on one of the largest CI datasets to date, with an emphasis placed on identification of poor performers.</p><p><strong>Design: </strong>All patients enrolled in the national CI outcome tracking database, HERMES, and the institutional CI registry. Data were split 90/10 training/testing with hyperparameter tuning designed to optimize AUPRC performed during 10-fold cross-validation within 100 iterations. Multiple models were developed to predict final and delta (Δ) in consonant-nucleus-consonant (CNC) words and AzBio sentences at 6-month post-implantation. Two metrics, (1) final performance scores and (2) equally distributed 20th percentile performance ranking were used as primary outcomes. All models were compared with currently used \"gold standard,\" defined as linear or logistic regression models leveraging Lazard features (LF). Final metrics for comparison included mean absolute error (MAE), calibration curves, heat accuracy maps, area under the receiver operating curve (AUROC), and F1 score.</p><p><strong>Results: </strong>A total of 1877 patients were assessed through an ML pipeline. (1) XGBoost (XGB) predicted CNC with MAE of 17.4% (95% confidence interval [CI]: 17.34 to 17.53%) and AzBio with MAE of 20.39% (95% CI: 20.28 to 20.50%) and consistently outperformed linear regression with LF (CNC MAE 18.36% [95% CI: 18.25 to 18.47]; AzBio 21.62 [95% CI: 21.49 to 21.74]). Although statistically significant, the 1 to 2% boost of performance is clinically insignificant. (2) Predicting quintiles/20th percentile categories for CI performance, XGB outperformed logistic regression (Log-LF) across all metrics. XGB demonstrated superior calibration compared with Log-LF and provided a larger proportion of predicted probabilities predictions at the extremes (e.g., 0.1 or 0.9). XGB outperformed Log-LF predicting ≤40th percentile for CNC (AUROC: 0.708 versus 0.594; precision: 0.708 versus 0.596; F1 score: 0.708 versus 0.592) and AzBio (AUROC: 0.709 versus 0.572; precision: 0.710 versus 0.572; F1 score: 0.709 versus 0.572). This was consistent for ΔCNC and ΔAzBio. Last, accuracy heat maps demonstrated superior performance of XGB in stratifying sub-phenotypes/categories of CI performance compared with Log-LF.</p><p><strong>Conclusions: </strong>This study demonstrates how ML models can offer superior performance in CI speech perception outcomes prediction modeling compared with current gold standard (Lazard-linear or logistic regression). ML offers novel insights capable of capturing nonlinear complex relationships and can identify novel sub-phenotypes at the extremes of CI performance using preoperative clinical variables alone. This is the first study to our knowledge to offer any type of meaningful preoperative stratification for CI speech perception outcomes and may have significant implications that need to be carefully explored when it comes to patient counseling, auditory rehabilitation, and future CI clinical trials. While prospective validation is a necessary next step and performance is still limited based on current traditional CI variables, these results highlight the potential of artificial intelligence (AI) in CI care, the critical need to integrate novel variables that better account for CI performance, and the need for improved data collaboration and standardized registries moving forward.</p>","PeriodicalId":55172,"journal":{"name":"Ear and Hearing","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine Learning Feasibility in Cochlear Implant Speech Perception Outcomes-Moving Beyond Single Biomarkers for Cochlear Implant Performance Prediction.\",\"authors\":\"Matthew A Shew, Cole Pavelchek, Andrew Michelson, Amanda Ortmann, Shannon Lefler, Amit Walia, Nedim Durakovic, Alisa Phillips, Ayna Rejepova, Jacques A Herzog, Phillip Payne, Jay F Piccirillo, Craig A Buchman\",\"doi\":\"10.1097/AUD.0000000000001664\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objectives: </strong>Machine learning (ML) is an emerging discipline centered around complex pattern matching and large data-based prediction modeling and can improve precision medicine healthcare. Cochlear implants (CI) are highly effective, however, outcomes vary widely, and accurately predicting speech perception performance outcomes between patients remains a challenge. This study aims to evaluate the ability of ML to predict speech perception performance among CI recipients at 6-month post-implantation using only preoperative variables on one of the largest CI datasets to date, with an emphasis placed on identification of poor performers.</p><p><strong>Design: </strong>All patients enrolled in the national CI outcome tracking database, HERMES, and the institutional CI registry. Data were split 90/10 training/testing with hyperparameter tuning designed to optimize AUPRC performed during 10-fold cross-validation within 100 iterations. Multiple models were developed to predict final and delta (Δ) in consonant-nucleus-consonant (CNC) words and AzBio sentences at 6-month post-implantation. Two metrics, (1) final performance scores and (2) equally distributed 20th percentile performance ranking were used as primary outcomes. All models were compared with currently used \\\"gold standard,\\\" defined as linear or logistic regression models leveraging Lazard features (LF). Final metrics for comparison included mean absolute error (MAE), calibration curves, heat accuracy maps, area under the receiver operating curve (AUROC), and F1 score.</p><p><strong>Results: </strong>A total of 1877 patients were assessed through an ML pipeline. (1) XGBoost (XGB) predicted CNC with MAE of 17.4% (95% confidence interval [CI]: 17.34 to 17.53%) and AzBio with MAE of 20.39% (95% CI: 20.28 to 20.50%) and consistently outperformed linear regression with LF (CNC MAE 18.36% [95% CI: 18.25 to 18.47]; AzBio 21.62 [95% CI: 21.49 to 21.74]). Although statistically significant, the 1 to 2% boost of performance is clinically insignificant. (2) Predicting quintiles/20th percentile categories for CI performance, XGB outperformed logistic regression (Log-LF) across all metrics. XGB demonstrated superior calibration compared with Log-LF and provided a larger proportion of predicted probabilities predictions at the extremes (e.g., 0.1 or 0.9). XGB outperformed Log-LF predicting ≤40th percentile for CNC (AUROC: 0.708 versus 0.594; precision: 0.708 versus 0.596; F1 score: 0.708 versus 0.592) and AzBio (AUROC: 0.709 versus 0.572; precision: 0.710 versus 0.572; F1 score: 0.709 versus 0.572). This was consistent for ΔCNC and ΔAzBio. Last, accuracy heat maps demonstrated superior performance of XGB in stratifying sub-phenotypes/categories of CI performance compared with Log-LF.</p><p><strong>Conclusions: </strong>This study demonstrates how ML models can offer superior performance in CI speech perception outcomes prediction modeling compared with current gold standard (Lazard-linear or logistic regression). ML offers novel insights capable of capturing nonlinear complex relationships and can identify novel sub-phenotypes at the extremes of CI performance using preoperative clinical variables alone. This is the first study to our knowledge to offer any type of meaningful preoperative stratification for CI speech perception outcomes and may have significant implications that need to be carefully explored when it comes to patient counseling, auditory rehabilitation, and future CI clinical trials. While prospective validation is a necessary next step and performance is still limited based on current traditional CI variables, these results highlight the potential of artificial intelligence (AI) in CI care, the critical need to integrate novel variables that better account for CI performance, and the need for improved data collaboration and standardized registries moving forward.</p>\",\"PeriodicalId\":55172,\"journal\":{\"name\":\"Ear and Hearing\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-04-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ear and Hearing\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1097/AUD.0000000000001664\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ear and Hearing","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/AUD.0000000000001664","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
目标:机器学习(ML)是一门新兴学科,以复杂模式匹配和基于大数据的预测建模为中心,可以改善精准医疗保健。人工耳蜗(CI)是非常有效的,然而,结果差异很大,准确预测患者之间的语音感知表现结果仍然是一个挑战。本研究旨在评估ML在植入后6个月预测CI受者语音感知表现的能力,仅使用迄今为止最大的CI数据集之一的术前变量,重点是识别表现不佳的人。设计:所有患者均纳入国家CI结果跟踪数据库HERMES和机构CI登记处。数据分为90/10训练/测试,超参数调优旨在优化AUPRC,在100次迭代中进行10次交叉验证。开发了多个模型来预测植入后6个月时辅音-核-辅音(CNC)单词和AzBio句子的final和delta (Δ)。两个指标,(1)最终绩效得分和(2)平均分布的第二十百分位绩效排名作为主要结果。所有模型都与当前使用的“黄金标准”进行了比较,“黄金标准”被定义为利用Lazard特征(LF)的线性或逻辑回归模型。比较的最终指标包括平均绝对误差(MAE)、校准曲线、热精度图、受者工作曲线下面积(AUROC)和F1评分。结果:通过ML管道共评估了1877例患者。(1) XGBoost (XGB)预测CNC的MAE为17.4%(95%置信区间[CI]: 17.34 ~ 17.53%), AzBio的MAE为20.39% (95% CI: 20.28 ~ 20.50%),并始终优于LF (CNC MAE 18.36% [95% CI: 18.25 ~ 18.47]的线性回归;AzBio 21.62 [95% CI: 21.49 ~ 21.74])。虽然在统计学上是显著的,但在临床上,1 - 2%的性能提升是不显著的。(2)预测CI性能的五分位数/20百分位数类别,XGB在所有指标上都优于逻辑回归(Log-LF)。与Log-LF相比,XGB显示出更好的校准,并且在极端情况下(例如,0.1或0.9)提供了更大比例的预测概率预测。XGB优于Log-LF预测CNC≤40百分位数(AUROC: 0.708 vs 0.594;精密度:0.708 vs 0.596;F1评分:0.708比0.592)和AzBio评分(AUROC: 0.709比0.572;精密度:0.710 vs 0.572;F1得分:0.709 vs 0.572)。这与ΔCNC和ΔAzBio是一致的。最后,准确度热图显示,与Log-LF相比,XGB在分层CI性能的亚表型/类别方面具有优越的性能。结论:本研究表明,与目前的黄金标准(拉扎德线性或逻辑回归)相比,机器学习模型如何在CI语音感知结果预测建模中提供卓越的性能。ML提供了能够捕获非线性复杂关系的新颖见解,并且可以仅使用术前临床变量在CI表现的极端情况下识别新的亚表型。据我们所知,这是第一项为CI语音感知结果提供任何类型的有意义的术前分层的研究,可能对患者咨询、听觉康复和未来CI临床试验有重要意义,需要仔细探讨。虽然前瞻性验证是必要的下一步,并且基于当前传统CI变量的性能仍然有限,但这些结果突出了人工智能(AI)在CI护理中的潜力,迫切需要整合更好地解释CI性能的新变量,以及改进数据协作和标准化注册表的需求。
Machine Learning Feasibility in Cochlear Implant Speech Perception Outcomes-Moving Beyond Single Biomarkers for Cochlear Implant Performance Prediction.
Objectives: Machine learning (ML) is an emerging discipline centered around complex pattern matching and large data-based prediction modeling and can improve precision medicine healthcare. Cochlear implants (CI) are highly effective, however, outcomes vary widely, and accurately predicting speech perception performance outcomes between patients remains a challenge. This study aims to evaluate the ability of ML to predict speech perception performance among CI recipients at 6-month post-implantation using only preoperative variables on one of the largest CI datasets to date, with an emphasis placed on identification of poor performers.
Design: All patients enrolled in the national CI outcome tracking database, HERMES, and the institutional CI registry. Data were split 90/10 training/testing with hyperparameter tuning designed to optimize AUPRC performed during 10-fold cross-validation within 100 iterations. Multiple models were developed to predict final and delta (Δ) in consonant-nucleus-consonant (CNC) words and AzBio sentences at 6-month post-implantation. Two metrics, (1) final performance scores and (2) equally distributed 20th percentile performance ranking were used as primary outcomes. All models were compared with currently used "gold standard," defined as linear or logistic regression models leveraging Lazard features (LF). Final metrics for comparison included mean absolute error (MAE), calibration curves, heat accuracy maps, area under the receiver operating curve (AUROC), and F1 score.
Results: A total of 1877 patients were assessed through an ML pipeline. (1) XGBoost (XGB) predicted CNC with MAE of 17.4% (95% confidence interval [CI]: 17.34 to 17.53%) and AzBio with MAE of 20.39% (95% CI: 20.28 to 20.50%) and consistently outperformed linear regression with LF (CNC MAE 18.36% [95% CI: 18.25 to 18.47]; AzBio 21.62 [95% CI: 21.49 to 21.74]). Although statistically significant, the 1 to 2% boost of performance is clinically insignificant. (2) Predicting quintiles/20th percentile categories for CI performance, XGB outperformed logistic regression (Log-LF) across all metrics. XGB demonstrated superior calibration compared with Log-LF and provided a larger proportion of predicted probabilities predictions at the extremes (e.g., 0.1 or 0.9). XGB outperformed Log-LF predicting ≤40th percentile for CNC (AUROC: 0.708 versus 0.594; precision: 0.708 versus 0.596; F1 score: 0.708 versus 0.592) and AzBio (AUROC: 0.709 versus 0.572; precision: 0.710 versus 0.572; F1 score: 0.709 versus 0.572). This was consistent for ΔCNC and ΔAzBio. Last, accuracy heat maps demonstrated superior performance of XGB in stratifying sub-phenotypes/categories of CI performance compared with Log-LF.
Conclusions: This study demonstrates how ML models can offer superior performance in CI speech perception outcomes prediction modeling compared with current gold standard (Lazard-linear or logistic regression). ML offers novel insights capable of capturing nonlinear complex relationships and can identify novel sub-phenotypes at the extremes of CI performance using preoperative clinical variables alone. This is the first study to our knowledge to offer any type of meaningful preoperative stratification for CI speech perception outcomes and may have significant implications that need to be carefully explored when it comes to patient counseling, auditory rehabilitation, and future CI clinical trials. While prospective validation is a necessary next step and performance is still limited based on current traditional CI variables, these results highlight the potential of artificial intelligence (AI) in CI care, the critical need to integrate novel variables that better account for CI performance, and the need for improved data collaboration and standardized registries moving forward.
期刊介绍:
From the basic science of hearing and balance disorders to auditory electrophysiology to amplification and the psychological factors of hearing loss, Ear and Hearing covers all aspects of auditory and vestibular disorders. This multidisciplinary journal consolidates the various factors that contribute to identification, remediation, and audiologic and vestibular rehabilitation. It is the one journal that serves the diverse interest of all members of this professional community -- otologists, audiologists, educators, and to those involved in the design, manufacture, and distribution of amplification systems. The original articles published in the journal focus on assessment, diagnosis, and management of auditory and vestibular disorders.