Benno Gesierich , Lukas Pirpamer , Dominik S Meier , Michael Amann , Minne N Cerfontaine , Frank-Erik de Leeuw , Pauline Maillard , Sue Moy , Karl G. Helmer , Michael Kühne , Leo H Bonati , Julie W Rutten , Saskia A.J. Lesnik Oberstein , Marco Duering , Alzheimer’s Disease Neuroimaging Initiative
{"title":"Technical and clinical validation of a novel deep learning-based white matter hyperintensity segmentation tool","authors":"Benno Gesierich , Lukas Pirpamer , Dominik S Meier , Michael Amann , Minne N Cerfontaine , Frank-Erik de Leeuw , Pauline Maillard , Sue Moy , Karl G. Helmer , Michael Kühne , Leo H Bonati , Julie W Rutten , Saskia A.J. Lesnik Oberstein , Marco Duering , Alzheimer’s Disease Neuroimaging Initiative","doi":"10.1016/j.cccb.2025.100393","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>White matter hyperintensities (WMH) on MRI are a hallmark of cerebral small vessel disease. Although numerous WMH segmentation tools exist, each presents relevant limitations that can impact their usability. This research aimed to develop, validate, and disseminate a novel WMH segmentation algorithm to address these limitations.</div></div><div><h3>Methods</h3><div>Using an intentionally heterogeneous dataset, we trained models based on the MD-GRU and nnU-Net deep learning algorithms. The new models were benchmarked in both technical and clinical validation against current state-of-the-art algorithms, utilizing datasets that were not included in the training data. For technical validation in patients, we assessed bias and precision against reference masks, scan-rescan repeatability and inter-scanner reproducibility in data from the MarkVCID consortium. Segmentation performance on 2D data was evaluated using the SWISS-AF dataset. For clinical validation, we determined percent volume change over a two-year follow-up in the DiViNAS study and calculated statistical power to detect treatment effects.</div></div><div><h3>Results</h3><div>The newly trained algorithms outperformed the benchmarking algorithms, demonstrating better agreement with reference volumes, as well as less bias and higher precision in the repeatability and reproducibility experiments. The nnU-Net algorithm exhibited the highest statistical power for detecting treatment effects, requiring a 41 % smaller sample size than the best-performing benchmarking algorithm.</div></div><div><h3>Conclusion</h3><div>We developed and systematically validated two novel WMH segmentation algorithms, which demonstrated excellent generalization capabilities. The comprehensive, user-friendly processing pipelines are publicly available as prebuilt software containers and can be applied to a wide range of datasets without re-training or modifications.</div></div>","PeriodicalId":72549,"journal":{"name":"Cerebral circulation - cognition and behavior","volume":"9 ","pages":"Article 100393"},"PeriodicalIF":2.8000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cerebral circulation - cognition and behavior","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666245025000170","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction
White matter hyperintensities (WMH) on MRI are a hallmark of cerebral small vessel disease. Although numerous WMH segmentation tools exist, each presents relevant limitations that can impact their usability. This research aimed to develop, validate, and disseminate a novel WMH segmentation algorithm to address these limitations.
Methods
Using an intentionally heterogeneous dataset, we trained models based on the MD-GRU and nnU-Net deep learning algorithms. The new models were benchmarked in both technical and clinical validation against current state-of-the-art algorithms, utilizing datasets that were not included in the training data. For technical validation in patients, we assessed bias and precision against reference masks, scan-rescan repeatability and inter-scanner reproducibility in data from the MarkVCID consortium. Segmentation performance on 2D data was evaluated using the SWISS-AF dataset. For clinical validation, we determined percent volume change over a two-year follow-up in the DiViNAS study and calculated statistical power to detect treatment effects.
Results
The newly trained algorithms outperformed the benchmarking algorithms, demonstrating better agreement with reference volumes, as well as less bias and higher precision in the repeatability and reproducibility experiments. The nnU-Net algorithm exhibited the highest statistical power for detecting treatment effects, requiring a 41 % smaller sample size than the best-performing benchmarking algorithm.
Conclusion
We developed and systematically validated two novel WMH segmentation algorithms, which demonstrated excellent generalization capabilities. The comprehensive, user-friendly processing pipelines are publicly available as prebuilt software containers and can be applied to a wide range of datasets without re-training or modifications.