Comparison of Deep Learning Models for Objective Auditory Brainstem Response Detection: A Multicenter Validation Study.

IF 3 2区 医学 Q1 AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY
Trends in Hearing Pub Date : 2025-01-01 Epub Date: 2025-06-03 DOI:10.1177/23312165251347773
Yin Liu, Lingjie Xiang, Qiang Li, Kangkang Li, Yihan Yang, Tiantian Wang, Yuting Qin, Xinxing Fu, Yu Zhao, Chenqiang Gao
{"title":"Comparison of Deep Learning Models for Objective Auditory Brainstem Response Detection: A Multicenter Validation Study.","authors":"Yin Liu, Lingjie Xiang, Qiang Li, Kangkang Li, Yihan Yang, Tiantian Wang, Yuting Qin, Xinxing Fu, Yu Zhao, Chenqiang Gao","doi":"10.1177/23312165251347773","DOIUrl":null,"url":null,"abstract":"<p><p>Auditory brainstem response (ABR) interpretation in clinical practice often relies on visual inspection by audiologists, which is prone to inter-practitioner variability. While deep learning (DL) algorithms have shown promise in objectifying ABR detection in controlled settings, their applicability to real-world clinical data is hindered by small datasets and insufficient heterogeneity. This study evaluates the generalizability of nine DL models for ABR detection using large, multicenter datasets. The primary dataset analyzed, Clinical Dataset I, comprises 128,123 labeled ABRs from 13,813 participants across a wide range of ages and hearing levels, and was divided into a training set (90%) and a held-out test set (10%). The models included convolutional neural networks (CNNs; AlexNet, VGG, ResNet), transformer-based architectures (Transformer, Patch Time Series Transformer [PatchTST], Differential Transformer, and Differential PatchTST), and hybrid CNN-transformer models (ResTransformer, ResPatchTST). Performance was assessed on the held-out test set and four external datasets (Clinical II, Southampton, PhysioNet, Mendeley) using accuracy and area under the receiver operating characteristic curve (AUC). ResPatchTST achieved the highest performance on the held-out test set (accuracy: 91.90%, AUC: 0.976). Transformer-based models, particularly PatchTST, showed superior generalization to external datasets, maintaining robust accuracy across diverse clinical settings. Additional experiments highlighted the critical role of dataset size and diversity in enhancing model robustness. We also observed that incorporating acquisition parameters and demographic features as auxiliary inputs yielded performance gains in cross-center generalization. These findings underscore the potential of DL models-especially transformer-based architectures-for accurate and generalizable ABR detection, and highlight the necessity of large, diverse datasets in developing clinically reliable systems.</p>","PeriodicalId":48678,"journal":{"name":"Trends in Hearing","volume":"29 ","pages":"23312165251347773"},"PeriodicalIF":3.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12134522/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Trends in Hearing","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/23312165251347773","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/3 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Auditory brainstem response (ABR) interpretation in clinical practice often relies on visual inspection by audiologists, which is prone to inter-practitioner variability. While deep learning (DL) algorithms have shown promise in objectifying ABR detection in controlled settings, their applicability to real-world clinical data is hindered by small datasets and insufficient heterogeneity. This study evaluates the generalizability of nine DL models for ABR detection using large, multicenter datasets. The primary dataset analyzed, Clinical Dataset I, comprises 128,123 labeled ABRs from 13,813 participants across a wide range of ages and hearing levels, and was divided into a training set (90%) and a held-out test set (10%). The models included convolutional neural networks (CNNs; AlexNet, VGG, ResNet), transformer-based architectures (Transformer, Patch Time Series Transformer [PatchTST], Differential Transformer, and Differential PatchTST), and hybrid CNN-transformer models (ResTransformer, ResPatchTST). Performance was assessed on the held-out test set and four external datasets (Clinical II, Southampton, PhysioNet, Mendeley) using accuracy and area under the receiver operating characteristic curve (AUC). ResPatchTST achieved the highest performance on the held-out test set (accuracy: 91.90%, AUC: 0.976). Transformer-based models, particularly PatchTST, showed superior generalization to external datasets, maintaining robust accuracy across diverse clinical settings. Additional experiments highlighted the critical role of dataset size and diversity in enhancing model robustness. We also observed that incorporating acquisition parameters and demographic features as auxiliary inputs yielded performance gains in cross-center generalization. These findings underscore the potential of DL models-especially transformer-based architectures-for accurate and generalizable ABR detection, and highlight the necessity of large, diverse datasets in developing clinically reliable systems.

深度学习模型在客观听觉脑干反应检测中的比较:一项多中心验证研究。
临床实践中听觉脑干反应(ABR)的解释往往依赖于听力学家的视觉检查,这很容易引起实践者之间的差异。虽然深度学习(DL)算法在控制环境中客观化ABR检测方面表现出了希望,但由于数据集小和异质性不足,它们对现实世界临床数据的适用性受到了阻碍。本研究使用大型、多中心数据集评估了9个深度学习模型用于ABR检测的泛化性。分析的主要数据集,临床数据集I,包括来自13,813名参与者的128,123个标记的abr,涵盖了广泛的年龄和听力水平,并分为训练集(90%)和测试集(10%)。模型包括卷积神经网络(cnn);AlexNet, VGG, ResNet),基于变压器的体系结构(变压器,贴片时间序列变压器[PatchTST],差动变压器和差动PatchTST),以及混合cnn -变压器模型(restrtransformer, ResPatchTST)。使用准确性和受试者工作特征曲线下面积(AUC),在测试集和四个外部数据集(Clinical II, Southampton, PhysioNet, Mendeley)上评估性能。ResPatchTST在hold -out测试集上取得了最高的性能(准确率:91.90%,AUC: 0.976)。基于变压器的模型,特别是PatchTST,显示出对外部数据集的优越通用性,在不同的临床环境中保持了强大的准确性。其他实验强调了数据集大小和多样性在增强模型鲁棒性方面的关键作用。我们还观察到,将获取参数和人口特征作为辅助输入,在跨中心泛化中获得了性能提升。这些发现强调了深度学习模型(尤其是基于变压器的架构)在准确和通用的ABR检测方面的潜力,并强调了开发临床可靠系统时大型、多样化数据集的必要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Trends in Hearing
Trends in Hearing AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGYOTORH-OTORHINOLARYNGOLOGY
CiteScore
4.50
自引率
11.10%
发文量
44
审稿时长
12 weeks
期刊介绍: Trends in Hearing is an open access journal completely dedicated to publishing original research and reviews focusing on human hearing, hearing loss, hearing aids, auditory implants, and aural rehabilitation. Under its former name, Trends in Amplification, the journal established itself as a forum for concise explorations of all areas of translational hearing research by leaders in the field. Trends in Hearing has now expanded its focus to include original research articles, with the goal of becoming the premier venue for research related to human hearing and hearing loss.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信