Aaron Kujawa, Reuben Dorent, Steve Connor, Suki Thomson, Marina Ivory, Ali Vahedi, Emily Guilhem, Navodini Wijethilake, Robert Bradford, Neil Kitchen, Sotirios Bisdas, Sebastien Ourselin, Tom Vercauteren, Jonathan Shapey
{"title":"用于自动分割前庭分裂瘤的深度学习:一项来自多中心常规磁共振成像的回顾性研究","authors":"Aaron Kujawa, Reuben Dorent, Steve Connor, Suki Thomson, Marina Ivory, Ali Vahedi, Emily Guilhem, Navodini Wijethilake, Robert Bradford, Neil Kitchen, Sotirios Bisdas, Sebastien Ourselin, Tom Vercauteren, Jonathan Shapey","doi":"10.3389/fncom.2024.1365727","DOIUrl":null,"url":null,"abstract":"Automatic segmentation of vestibular schwannoma (VS) from routine clinical MRI has potential to improve clinical workflow, facilitate treatment decisions, and assist patient management. Previous work demonstrated reliable automatic segmentation performance on datasets of standardized MRI images acquired for stereotactic surgery planning. However, diagnostic clinical datasets are generally more diverse and pose a larger challenge to automatic segmentation algorithms, especially when post-operative images are included. In this work, we show for the first time that automatic segmentation of VS on routine MRI datasets is also possible with high accuracy. We acquired and publicly release a curated multi-center routine clinical (MC-RC) dataset of 160 patients with a single sporadic VS. For each patient up to three longitudinal MRI exams with contrast-enhanced T1-weighted (ceT1w) (<jats:italic>n</jats:italic> = 124) and T2-weighted (T2w) (<jats:italic>n</jats:italic> = 363) images were included and the VS manually annotated. Segmentations were produced and verified in an iterative process: (1) initial segmentations by a specialized company; (2) review by one of three trained radiologists; and (3) validation by an expert team. Inter- and intra-observer reliability experiments were performed on a subset of the dataset. A state-of-the-art deep learning framework was used to train segmentation models for VS. Model performance was evaluated on a MC-RC hold-out testing set, another public VS datasets, and a partially public dataset. The generalizability and robustness of the VS deep learning segmentation models increased significantly when trained on the MC-RC dataset. Dice similarity coefficients (DSC) achieved by our model are comparable to those achieved by trained radiologists in the inter-observer experiment. On the MC-RC testing set, median DSCs were 86.2(9.5) for ceT1w, 89.4(7.0) for T2w, and 86.4(8.6) for combined ceT1w+T2w input images. On another public dataset acquired for Gamma Knife stereotactic radiosurgery our model achieved median DSCs of 95.3(2.9), 92.8(3.8), and 95.5(3.3), respectively. In contrast, models trained on the Gamma Knife dataset did not generalize well as illustrated by significant underperformance on the MC-RC routine MRI dataset, highlighting the importance of data variability in the development of robust VS segmentation models. The MC-RC dataset and all trained deep learning models were made available online.","PeriodicalId":12363,"journal":{"name":"Frontiers in Computational Neuroscience","volume":"21 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep learning for automatic segmentation of vestibular schwannoma: a retrospective study from multi-center routine MRI\",\"authors\":\"Aaron Kujawa, Reuben Dorent, Steve Connor, Suki Thomson, Marina Ivory, Ali Vahedi, Emily Guilhem, Navodini Wijethilake, Robert Bradford, Neil Kitchen, Sotirios Bisdas, Sebastien Ourselin, Tom Vercauteren, Jonathan Shapey\",\"doi\":\"10.3389/fncom.2024.1365727\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic segmentation of vestibular schwannoma (VS) from routine clinical MRI has potential to improve clinical workflow, facilitate treatment decisions, and assist patient management. Previous work demonstrated reliable automatic segmentation performance on datasets of standardized MRI images acquired for stereotactic surgery planning. However, diagnostic clinical datasets are generally more diverse and pose a larger challenge to automatic segmentation algorithms, especially when post-operative images are included. In this work, we show for the first time that automatic segmentation of VS on routine MRI datasets is also possible with high accuracy. We acquired and publicly release a curated multi-center routine clinical (MC-RC) dataset of 160 patients with a single sporadic VS. For each patient up to three longitudinal MRI exams with contrast-enhanced T1-weighted (ceT1w) (<jats:italic>n</jats:italic> = 124) and T2-weighted (T2w) (<jats:italic>n</jats:italic> = 363) images were included and the VS manually annotated. Segmentations were produced and verified in an iterative process: (1) initial segmentations by a specialized company; (2) review by one of three trained radiologists; and (3) validation by an expert team. Inter- and intra-observer reliability experiments were performed on a subset of the dataset. A state-of-the-art deep learning framework was used to train segmentation models for VS. Model performance was evaluated on a MC-RC hold-out testing set, another public VS datasets, and a partially public dataset. The generalizability and robustness of the VS deep learning segmentation models increased significantly when trained on the MC-RC dataset. Dice similarity coefficients (DSC) achieved by our model are comparable to those achieved by trained radiologists in the inter-observer experiment. On the MC-RC testing set, median DSCs were 86.2(9.5) for ceT1w, 89.4(7.0) for T2w, and 86.4(8.6) for combined ceT1w+T2w input images. On another public dataset acquired for Gamma Knife stereotactic radiosurgery our model achieved median DSCs of 95.3(2.9), 92.8(3.8), and 95.5(3.3), respectively. In contrast, models trained on the Gamma Knife dataset did not generalize well as illustrated by significant underperformance on the MC-RC routine MRI dataset, highlighting the importance of data variability in the development of robust VS segmentation models. The MC-RC dataset and all trained deep learning models were made available online.\",\"PeriodicalId\":12363,\"journal\":{\"name\":\"Frontiers in Computational Neuroscience\",\"volume\":\"21 1\",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-05-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Computational Neuroscience\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3389/fncom.2024.1365727\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Computational Neuroscience","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fncom.2024.1365727","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
从常规临床磁共振成像中自动分割前庭分裂瘤(VS)有望改善临床工作流程、促进治疗决策并协助患者管理。之前的工作表明,在为立体定向手术规划而获取的标准化磁共振成像数据集上,自动分割性能可靠。然而,临床诊断数据集通常更加多样化,对自动分割算法提出了更大的挑战,尤其是在包含术后图像的情况下。在这项工作中,我们首次展示了在常规磁共振成像数据集上自动分割 VS 的高准确性。我们获得并公开发布了一个由 160 名单个散发性 VS 患者组成的多中心常规临床(MC-RC)数据集。每位患者最多可接受三次纵向 MRI 检查,包括对比增强 T1 加权(ceT1w)(124 人)和 T2 加权(T2w)(363 人)图像,并对 VS 进行人工标注。分段的制作和验证是一个反复的过程:(1) 由一家专业公司进行初步分段;(2) 由三位训练有素的放射科医生之一进行审查;(3) 由一个专家组进行验证。在数据集的一个子集上进行了观察者之间和观察者内部的可靠性实验。最先进的深度学习框架用于训练 VS 的分割模型。在 MC-RC 暂缓测试集、另一个公开 VS 数据集和一个部分公开数据集上对模型性能进行了评估。在 MC-RC 数据集上训练的 VS 深度学习分割模型的泛化能力和鲁棒性显著提高。在观察者间实验中,我们的模型获得的骰子相似系数(DSC)与经过培训的放射科医生获得的相似系数相当。在 MC-RC 测试集中,ceT1w 的 DSC 中位数为 86.2(9.5),T2w 为 89.4(7.0),ceT1w+T2w 组合输入图像的 DSC 中位数为 86.4(8.6)。在为伽马刀立体定向放射外科手术获取的另一个公共数据集上,我们的模型分别获得了 95.3(2.9)、92.8(3.8) 和 95.5(3.3) 的中位 DSCs。相比之下,在伽马刀数据集上训练的模型并不能很好地泛化,在 MC-RC 常规 MRI 数据集上的表现就说明了这一点,这突出了数据可变性在开发稳健的 VS 分割模型中的重要性。MC-RC 数据集和所有经过训练的深度学习模型均可在线获取。
Deep learning for automatic segmentation of vestibular schwannoma: a retrospective study from multi-center routine MRI
Automatic segmentation of vestibular schwannoma (VS) from routine clinical MRI has potential to improve clinical workflow, facilitate treatment decisions, and assist patient management. Previous work demonstrated reliable automatic segmentation performance on datasets of standardized MRI images acquired for stereotactic surgery planning. However, diagnostic clinical datasets are generally more diverse and pose a larger challenge to automatic segmentation algorithms, especially when post-operative images are included. In this work, we show for the first time that automatic segmentation of VS on routine MRI datasets is also possible with high accuracy. We acquired and publicly release a curated multi-center routine clinical (MC-RC) dataset of 160 patients with a single sporadic VS. For each patient up to three longitudinal MRI exams with contrast-enhanced T1-weighted (ceT1w) (n = 124) and T2-weighted (T2w) (n = 363) images were included and the VS manually annotated. Segmentations were produced and verified in an iterative process: (1) initial segmentations by a specialized company; (2) review by one of three trained radiologists; and (3) validation by an expert team. Inter- and intra-observer reliability experiments were performed on a subset of the dataset. A state-of-the-art deep learning framework was used to train segmentation models for VS. Model performance was evaluated on a MC-RC hold-out testing set, another public VS datasets, and a partially public dataset. The generalizability and robustness of the VS deep learning segmentation models increased significantly when trained on the MC-RC dataset. Dice similarity coefficients (DSC) achieved by our model are comparable to those achieved by trained radiologists in the inter-observer experiment. On the MC-RC testing set, median DSCs were 86.2(9.5) for ceT1w, 89.4(7.0) for T2w, and 86.4(8.6) for combined ceT1w+T2w input images. On another public dataset acquired for Gamma Knife stereotactic radiosurgery our model achieved median DSCs of 95.3(2.9), 92.8(3.8), and 95.5(3.3), respectively. In contrast, models trained on the Gamma Knife dataset did not generalize well as illustrated by significant underperformance on the MC-RC routine MRI dataset, highlighting the importance of data variability in the development of robust VS segmentation models. The MC-RC dataset and all trained deep learning models were made available online.
期刊介绍:
Frontiers in Computational Neuroscience is a first-tier electronic journal devoted to promoting theoretical modeling of brain function and fostering interdisciplinary interactions between theoretical and experimental neuroscience. Progress in understanding the amazing capabilities of the brain is still limited, and we believe that it will only come with deep theoretical thinking and mutually stimulating cooperation between different disciplines and approaches. We therefore invite original contributions on a wide range of topics that present the fruits of such cooperation, or provide stimuli for future alliances. We aim to provide an interactive forum for cutting-edge theoretical studies of the nervous system, and for promulgating the best theoretical research to the broader neuroscience community. Models of all styles and at all levels are welcome, from biophysically motivated realistic simulations of neurons and synapses to high-level abstract models of inference and decision making. While the journal is primarily focused on theoretically based and driven research, we welcome experimental studies that validate and test theoretical conclusions.
Also: comp neuro