Comparison of Deep Learning Models for Objective Auditory Brainstem Response Detection: A Multicenter Validation Study.

IF 3 2区医学 Q1 AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY

Trends in Hearing Pub Date : 2025-01-01 Epub Date: 2025-06-03 DOI:10.1177/23312165251347773

Yin Liu, Lingjie Xiang, Qiang Li, Kangkang Li, Yihan Yang, Tiantian Wang, Yuting Qin, Xinxing Fu, Yu Zhao, Chenqiang Gao

{"title":"Comparison of Deep Learning Models for Objective Auditory Brainstem Response Detection: A Multicenter Validation Study.","authors":"Yin Liu, Lingjie Xiang, Qiang Li, Kangkang Li, Yihan Yang, Tiantian Wang, Yuting Qin, Xinxing Fu, Yu Zhao, Chenqiang Gao","doi":"10.1177/23312165251347773","DOIUrl":null,"url":null,"abstract":"<p><p>Auditory brainstem response (ABR) interpretation in clinical practice often relies on visual inspection by audiologists, which is prone to inter-practitioner variability. While deep learning (DL) algorithms have shown promise in objectifying ABR detection in controlled settings, their applicability to real-world clinical data is hindered by small datasets and insufficient heterogeneity. This study evaluates the generalizability of nine DL models for ABR detection using large, multicenter datasets. The primary dataset analyzed, Clinical Dataset I, comprises 128,123 labeled ABRs from 13,813 participants across a wide range of ages and hearing levels, and was divided into a training set (90%) and a held-out test set (10%). The models included convolutional neural networks (CNNs; AlexNet, VGG, ResNet), transformer-based architectures (Transformer, Patch Time Series Transformer [PatchTST], Differential Transformer, and Differential PatchTST), and hybrid CNN-transformer models (ResTransformer, ResPatchTST). Performance was assessed on the held-out test set and four external datasets (Clinical II, Southampton, PhysioNet, Mendeley) using accuracy and area under the receiver operating characteristic curve (AUC). ResPatchTST achieved the highest performance on the held-out test set (accuracy: 91.90%, AUC: 0.976). Transformer-based models, particularly PatchTST, showed superior generalization to external datasets, maintaining robust accuracy across diverse clinical settings. Additional experiments highlighted the critical role of dataset size and diversity in enhancing model robustness. We also observed that incorporating acquisition parameters and demographic features as auxiliary inputs yielded performance gains in cross-center generalization. These findings underscore the potential of DL models-especially transformer-based architectures-for accurate and generalizable ABR detection, and highlight the necessity of large, diverse datasets in developing clinically reliable systems.</p>","PeriodicalId":48678,"journal":{"name":"Trends in Hearing","volume":"29 ","pages":"23312165251347773"},"PeriodicalIF":3.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12134522/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Trends in Hearing","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/23312165251347773","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/3 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Auditory brainstem response (ABR) interpretation in clinical practice often relies on visual inspection by audiologists, which is prone to inter-practitioner variability. While deep learning (DL) algorithms have shown promise in objectifying ABR detection in controlled settings, their applicability to real-world clinical data is hindered by small datasets and insufficient heterogeneity. This study evaluates the generalizability of nine DL models for ABR detection using large, multicenter datasets. The primary dataset analyzed, Clinical Dataset I, comprises 128,123 labeled ABRs from 13,813 participants across a wide range of ages and hearing levels, and was divided into a training set (90%) and a held-out test set (10%). The models included convolutional neural networks (CNNs; AlexNet, VGG, ResNet), transformer-based architectures (Transformer, Patch Time Series Transformer [PatchTST], Differential Transformer, and Differential PatchTST), and hybrid CNN-transformer models (ResTransformer, ResPatchTST). Performance was assessed on the held-out test set and four external datasets (Clinical II, Southampton, PhysioNet, Mendeley) using accuracy and area under the receiver operating characteristic curve (AUC). ResPatchTST achieved the highest performance on the held-out test set (accuracy: 91.90%, AUC: 0.976). Transformer-based models, particularly PatchTST, showed superior generalization to external datasets, maintaining robust accuracy across diverse clinical settings. Additional experiments highlighted the critical role of dataset size and diversity in enhancing model robustness. We also observed that incorporating acquisition parameters and demographic features as auxiliary inputs yielded performance gains in cross-center generalization. These findings underscore the potential of DL models-especially transformer-based architectures-for accurate and generalizable ABR detection, and highlight the necessity of large, diverse datasets in developing clinically reliable systems.

查看原文本刊更多论文

深度学习模型在客观听觉脑干反应检测中的比较：一项多中心验证研究。

临床实践中听觉脑干反应（ABR）的解释往往依赖于听力学家的视觉检查，这很容易引起实践者之间的差异。虽然深度学习（DL）算法在控制环境中客观化ABR检测方面表现出了希望，但由于数据集小和异质性不足，它们对现实世界临床数据的适用性受到了阻碍。本研究使用大型、多中心数据集评估了9个深度学习模型用于ABR检测的泛化性。分析的主要数据集，临床数据集I，包括来自13,813名参与者的128,123个标记的abr，涵盖了广泛的年龄和听力水平，并分为训练集（90%）和测试集（10%）。模型包括卷积神经网络（cnn）；AlexNet， VGG, ResNet)，基于变压器的体系结构（变压器，贴片时间序列变压器[PatchTST]，差动变压器和差动PatchTST），以及混合cnn -变压器模型（restrtransformer, ResPatchTST）。使用准确性和受试者工作特征曲线下面积（AUC），在测试集和四个外部数据集（Clinical II, Southampton, PhysioNet, Mendeley）上评估性能。ResPatchTST在hold -out测试集上取得了最高的性能（准确率：91.90%,AUC: 0.976）。基于变压器的模型，特别是PatchTST，显示出对外部数据集的优越通用性，在不同的临床环境中保持了强大的准确性。其他实验强调了数据集大小和多样性在增强模型鲁棒性方面的关键作用。我们还观察到，将获取参数和人口特征作为辅助输入，在跨中心泛化中获得了性能提升。这些发现强调了深度学习模型（尤其是基于变压器的架构）在准确和通用的ABR检测方面的潜力，并强调了开发临床可靠系统时大型、多样化数据集的必要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Trends in Hearing AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGYOTORH-OTORHINOLARYNGOLOGY

CiteScore

4.50

自引率

11.10%

发文量

审稿时长

12 weeks

期刊介绍： Trends in Hearing is an open access journal completely dedicated to publishing original research and reviews focusing on human hearing, hearing loss, hearing aids, auditory implants, and aural rehabilitation. Under its former name, Trends in Amplification, the journal established itself as a forum for concise explorations of all areas of translational hearing research by leaders in the field. Trends in Hearing has now expanded its focus to include original research articles, with the goal of becoming the premier venue for research related to human hearing and hearing loss.