Integration of nested cross-validation, automated hyperparameter optimization, high-performance computing to reduce and quantify the variance of test performance estimation of deep learning models

IF 4.8 2区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computer methods and programs in biomedicine Pub Date : 2025-09-10 DOI:10.1016/j.cmpb.2025.109063

Paul Calle , Averi Bates , Justin C. Reynolds , Yunlong Liu , Haoyang Cui , Sinaro Ly , Chen Wang , Qinghao Zhang , Alberto J. de Armendi , Shashank S. Shettar , Kar-Ming Fung , Qinggong Tang , Chongle Pan

{"title":"Integration of nested cross-validation, automated hyperparameter optimization, high-performance computing to reduce and quantify the variance of test performance estimation of deep learning models","authors":"Paul Calle , Averi Bates , Justin C. Reynolds , Yunlong Liu , Haoyang Cui , Sinaro Ly , Chen Wang , Qinghao Zhang , Alberto J. de Armendi , Shashank S. Shettar , Kar-Ming Fung , Qinggong Tang , Chongle Pan","doi":"10.1016/j.cmpb.2025.109063","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and Objectives:</h3><div>The variability and biases in the real-world performance benchmarking of deep learning models for medical imaging compromise their trustworthiness for real-world deployment. The common approach of holding out a single fixed test set fails to quantify the variance in the estimation of test performance metrics. This study introduces NACHOS (Nested and Automated Cross-validation and Hyperparameter Optimization using Supercomputing) to reduce and quantify the variance of test performance metrics of deep learning models.</div></div><div><h3>Methods:</h3><div>NACHOS integrates Nested Cross-Validation (NCV) and Automated Hyperparameter Optimization (AHPO) within a parallelized high-performance computing (HPC) framework. NACHOS was demonstrated on a chest X-ray repository and an Optical Coherence Tomography (OCT) dataset under multiple data partitioning schemes. Beyond performance estimation, DACHOS (Deployment with Automated Cross-validation and Hyperparameter Optimization using Supercomputing) is introduced to leverage AHPO and cross-validation to build the final model on the full dataset, improving expected deployment performance.</div></div><div><h3>Results:</h3><div>The findings underscore the importance of NCV in quantifying and reducing estimation variance, AHPO in optimizing hyperparameters consistently across test folds, and HPC in ensuring computational feasibility.</div></div><div><h3>Conclusions:</h3><div>By integrating these methodologies, NACHOS and DACHOS provide a scalable, reproducible, and trustworthy framework for DL model evaluation and deployment in medical imaging. To maximize public availability, the full open-source codebase is provided at <span><span>https://github.com/thepanlab/NACHOS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"272 ","pages":"Article 109063"},"PeriodicalIF":4.8000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725004808","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Background and Objectives:

The variability and biases in the real-world performance benchmarking of deep learning models for medical imaging compromise their trustworthiness for real-world deployment. The common approach of holding out a single fixed test set fails to quantify the variance in the estimation of test performance metrics. This study introduces NACHOS (Nested and Automated Cross-validation and Hyperparameter Optimization using Supercomputing) to reduce and quantify the variance of test performance metrics of deep learning models.

Methods:

NACHOS integrates Nested Cross-Validation (NCV) and Automated Hyperparameter Optimization (AHPO) within a parallelized high-performance computing (HPC) framework. NACHOS was demonstrated on a chest X-ray repository and an Optical Coherence Tomography (OCT) dataset under multiple data partitioning schemes. Beyond performance estimation, DACHOS (Deployment with Automated Cross-validation and Hyperparameter Optimization using Supercomputing) is introduced to leverage AHPO and cross-validation to build the final model on the full dataset, improving expected deployment performance.

Results:

The findings underscore the importance of NCV in quantifying and reducing estimation variance, AHPO in optimizing hyperparameters consistently across test folds, and HPC in ensuring computational feasibility.

Conclusions:

By integrating these methodologies, NACHOS and DACHOS provide a scalable, reproducible, and trustworthy framework for DL model evaluation and deployment in medical imaging. To maximize public availability, the full open-source codebase is provided at https://github.com/thepanlab/NACHOS.

查看原文本刊更多论文

集成嵌套交叉验证、自动化超参数优化和高性能计算，减少和量化深度学习模型测试性能估计的方差

背景和目的：医学成像深度学习模型在现实世界性能基准测试中的可变性和偏差损害了它们在现实世界部署中的可信度。提供单个固定测试集的常见方法无法量化测试性能度量估计中的方差。本研究引入NACHOS（嵌套和自动交叉验证和使用超级计算的超参数优化）来减少和量化深度学习模型的测试性能指标的方差。方法：NACHOS在并行高性能计算（HPC）框架内集成嵌套交叉验证（NCV）和自动超参数优化（AHPO）。在多种数据分区方案下，NACHOS在胸部x射线库和光学相干断层扫描（OCT）数据集上进行了演示。除了性能估计之外，还引入了DACHOS（使用超级计算的自动交叉验证和超参数优化部署）来利用AHPO和交叉验证在完整数据集上构建最终模型，从而提高预期的部署性能。结果：研究结果强调了NCV在量化和减少估计方差方面的重要性，AHPO在优化跨测试折叠的超参数方面的重要性，以及HPC在确保计算可行性方面的重要性。结论：通过整合这些方法，NACHOS和DACHOS为深度学习模型评估和在医学成像中的部署提供了一个可扩展、可重复和可信赖的框架。为了最大限度地提高公共可用性，在https://github.com/thepanlab/NACHOS上提供了完整的开源代码库。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer methods and programs in biomedicine 工程技术-工程：生物医学

CiteScore

12.30

自引率

6.60%

发文量

601

审稿时长

135 days

期刊介绍： To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine. Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.