Freda Werdiger , Vignan Yogendrakumar , Milanka Visser , James Kolacz , Christina Lam , Mitchell Hill , Chushuang Chen , Mark W. Parsons , Andrew Bivard
{"title":"Clinical performance review for 3-D Deep Learning segmentation of stroke infarct from diffusion-weighted images","authors":"Freda Werdiger , Vignan Yogendrakumar , Milanka Visser , James Kolacz , Christina Lam , Mitchell Hill , Chushuang Chen , Mark W. Parsons , Andrew Bivard","doi":"10.1016/j.ynirp.2024.100196","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><p>During the subacute phase of ischemic stroke, MR diffusion-weighted imaging (DWI) is used to assess the extent of tissue injury. Segmentation of DWI infarct is challenging due to disease variability, but Deep Learning (DL) provides a solution, outperforming existing methods on small datasets. However, a lack of clinically meaningful performance evaluation hinders clinical translation. Here we develop a DL DWI segmentation tool and provide clinical performance review.</p></div><div><h3>Methods</h3><p>Subjects in this retrospective study presented with stroke symptoms and later underwent DWI imaging. DL architectures U-Net and DenseNet were used to develop a DWI segmentation tool. The Dice Similarly Coefficient (DSC) was used to select the best- and worst-performing model. Clinical experts reviewed these models on the clinical test set, agreeing with the model if no 'significant’ error was present. The average agreement with the model and interrater agreement was also derived.</p></div><div><h3>Results</h3><p>In total, 573 participants with an ischemic stroke were included. The DenseNet delivered the best model (DSC = 0.831 ± 0.064) with a mean inference time of 0.07 s. Clinicians compared this with the worst model (U-Net, DSC = 0.759 ± 0.122), agreeing with the DenseNet predictions more than the U-Net (83.8 % vs. 79.3 %). Clinicians also agreed with each other more over performance interpretation when evaluating the DenseNet over the U-Net (87.9 % vs. 72.7 %).</p></div><div><h3>Conclusion</h3><p>Our DWI segmentation tool achieved high performance with clinical review providing meaningful performance evaluation. Model development will continue towards prospective deployment before which clinical review will be repeated. This work will benefit physicians in assessing patient prognosis.</p></div>","PeriodicalId":74277,"journal":{"name":"Neuroimage. Reports","volume":"4 1","pages":"Article 100196"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666956024000023/pdfft?md5=f324aa4c5c3ee9cb6266753d69b4de8d&pid=1-s2.0-S2666956024000023-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neuroimage. Reports","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666956024000023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Neuroscience","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction
During the subacute phase of ischemic stroke, MR diffusion-weighted imaging (DWI) is used to assess the extent of tissue injury. Segmentation of DWI infarct is challenging due to disease variability, but Deep Learning (DL) provides a solution, outperforming existing methods on small datasets. However, a lack of clinically meaningful performance evaluation hinders clinical translation. Here we develop a DL DWI segmentation tool and provide clinical performance review.
Methods
Subjects in this retrospective study presented with stroke symptoms and later underwent DWI imaging. DL architectures U-Net and DenseNet were used to develop a DWI segmentation tool. The Dice Similarly Coefficient (DSC) was used to select the best- and worst-performing model. Clinical experts reviewed these models on the clinical test set, agreeing with the model if no 'significant’ error was present. The average agreement with the model and interrater agreement was also derived.
Results
In total, 573 participants with an ischemic stroke were included. The DenseNet delivered the best model (DSC = 0.831 ± 0.064) with a mean inference time of 0.07 s. Clinicians compared this with the worst model (U-Net, DSC = 0.759 ± 0.122), agreeing with the DenseNet predictions more than the U-Net (83.8 % vs. 79.3 %). Clinicians also agreed with each other more over performance interpretation when evaluating the DenseNet over the U-Net (87.9 % vs. 72.7 %).
Conclusion
Our DWI segmentation tool achieved high performance with clinical review providing meaningful performance evaluation. Model development will continue towards prospective deployment before which clinical review will be repeated. This work will benefit physicians in assessing patient prognosis.