Lucas W Remedios, Han Liu, Samuel W Remedios, Lianrui Zuo, Adam M Saunders, Shunxing Bao, Yuankai Huo, Alvin C Powers, John Virostko, Bennett A Landman
{"title":"早期和晚期融合对不完全配准多模态磁共振成像胰腺分割的影响。","authors":"Lucas W Remedios, Han Liu, Samuel W Remedios, Lianrui Zuo, Adam M Saunders, Shunxing Bao, Yuankai Huo, Alvin C Powers, John Virostko, Bennett A Landman","doi":"10.1117/1.JMI.12.2.024008","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Combining different types of medical imaging data, through multimodal fusion, promises better segmentation of anatomical structures, such as the pancreas. Strategic implementation of multimodal fusion could improve our ability to study diseases such as diabetes. However, where to perform fusion in deep learning models is still an open question. It is unclear if there is a single best location to fuse information when analyzing pairs of imperfectly aligned images or if the optimal fusion location depends on the specific model being used. Two main challenges when using multiple imaging modalities to study the pancreas are (1) the pancreas and surrounding abdominal anatomy have a deformable structure, making it difficult to consistently align the images and (2) breathing by the individual during image collection further complicates the alignment between multimodal images. Even after using state-of-the-art deformable image registration techniques, specifically designed to align abdominal images, multimodal images of the abdomen are often not perfectly aligned. We examine how the choice of different fusion points, ranging from early in the image processing pipeline to later stages, impacts the segmentation of the pancreas on imperfectly registered multimodal magnetic resonance (MR) images.</p><p><strong>Approach: </strong>Our dataset consists of 353 pairs of T2-weighted (T2w) and T1-weighted (T1w) abdominal MR images from 163 subjects with accompanying pancreas segmentation labels drawn mainly based on the T2w images. Because the T2w images were acquired in an interleaved manner across two breath holds and the T1w images on one breath hold, there were three different breath holds impacting the alignment of each pair of images. We used deeds, a state-of-the-art deformable abdominal image registration method to align the image pairs. Then, we trained a collection of basic UNets with different fusion points, spanning from early to late layers in the model, to assess how early through late fusion influenced segmentation performance on imperfectly aligned images. To investigate whether performance differences on key fusion points are generalized to other architectures, we expanded our experiments to nnUNet.</p><p><strong>Results: </strong>The single-modality T2w baseline using a basic UNet model had a median Dice score of 0.766, whereas the same baseline on the nnUNet model achieved 0.824. For each fusion approach, we analyzed the differences in performance with Dice residuals, by subtracting the baseline score from the fusion score for each datapoint. For the basic UNet, the best fusion approach was from early/mid fusion and occurred in the middle of the encoder with a median Dice residual of <math><mrow><mo>+</mo> <mn>0.012</mn></mrow> </math> compared with the baseline. For the nnUNet, the best fusion approach was early fusion through naïve image concatenation before the model, with a median Dice residual of <math><mrow><mo>+</mo> <mn>0.004</mn></mrow> </math> compared with the baseline. After Bonferroni correction, the distributions of the Dice scores for these best fusion approaches were found to be statistically significant ( <math><mrow><mi>p</mi> <mo><</mo> <mn>0.05</mn></mrow> </math> ) via the paired Wilcoxon signed-rank test against the baseline.</p><p><strong>Conclusions: </strong>Fusion in specific blocks can improve performance, but the best blocks for fusion are model-specific, and the gains are small. In imperfectly registered datasets, fusion is a nuanced problem, with the art of design remaining vital for uncovering potential insights. Future innovation is needed to better address fusion in cases of imperfect alignment of abdominal image pairs. The code associated with this project is available here https://github.com/MASILab/influence_of_fusion_on_pancreas_segmentation.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 2","pages":"024008"},"PeriodicalIF":1.7000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12032765/pdf/","citationCount":"0","resultStr":"{\"title\":\"Influence of early through late fusion on pancreas segmentation from imperfectly registered multimodal magnetic resonance imaging.\",\"authors\":\"Lucas W Remedios, Han Liu, Samuel W Remedios, Lianrui Zuo, Adam M Saunders, Shunxing Bao, Yuankai Huo, Alvin C Powers, John Virostko, Bennett A Landman\",\"doi\":\"10.1117/1.JMI.12.2.024008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>Combining different types of medical imaging data, through multimodal fusion, promises better segmentation of anatomical structures, such as the pancreas. Strategic implementation of multimodal fusion could improve our ability to study diseases such as diabetes. However, where to perform fusion in deep learning models is still an open question. It is unclear if there is a single best location to fuse information when analyzing pairs of imperfectly aligned images or if the optimal fusion location depends on the specific model being used. Two main challenges when using multiple imaging modalities to study the pancreas are (1) the pancreas and surrounding abdominal anatomy have a deformable structure, making it difficult to consistently align the images and (2) breathing by the individual during image collection further complicates the alignment between multimodal images. Even after using state-of-the-art deformable image registration techniques, specifically designed to align abdominal images, multimodal images of the abdomen are often not perfectly aligned. We examine how the choice of different fusion points, ranging from early in the image processing pipeline to later stages, impacts the segmentation of the pancreas on imperfectly registered multimodal magnetic resonance (MR) images.</p><p><strong>Approach: </strong>Our dataset consists of 353 pairs of T2-weighted (T2w) and T1-weighted (T1w) abdominal MR images from 163 subjects with accompanying pancreas segmentation labels drawn mainly based on the T2w images. Because the T2w images were acquired in an interleaved manner across two breath holds and the T1w images on one breath hold, there were three different breath holds impacting the alignment of each pair of images. We used deeds, a state-of-the-art deformable abdominal image registration method to align the image pairs. Then, we trained a collection of basic UNets with different fusion points, spanning from early to late layers in the model, to assess how early through late fusion influenced segmentation performance on imperfectly aligned images. To investigate whether performance differences on key fusion points are generalized to other architectures, we expanded our experiments to nnUNet.</p><p><strong>Results: </strong>The single-modality T2w baseline using a basic UNet model had a median Dice score of 0.766, whereas the same baseline on the nnUNet model achieved 0.824. For each fusion approach, we analyzed the differences in performance with Dice residuals, by subtracting the baseline score from the fusion score for each datapoint. For the basic UNet, the best fusion approach was from early/mid fusion and occurred in the middle of the encoder with a median Dice residual of <math><mrow><mo>+</mo> <mn>0.012</mn></mrow> </math> compared with the baseline. For the nnUNet, the best fusion approach was early fusion through naïve image concatenation before the model, with a median Dice residual of <math><mrow><mo>+</mo> <mn>0.004</mn></mrow> </math> compared with the baseline. After Bonferroni correction, the distributions of the Dice scores for these best fusion approaches were found to be statistically significant ( <math><mrow><mi>p</mi> <mo><</mo> <mn>0.05</mn></mrow> </math> ) via the paired Wilcoxon signed-rank test against the baseline.</p><p><strong>Conclusions: </strong>Fusion in specific blocks can improve performance, but the best blocks for fusion are model-specific, and the gains are small. In imperfectly registered datasets, fusion is a nuanced problem, with the art of design remaining vital for uncovering potential insights. Future innovation is needed to better address fusion in cases of imperfect alignment of abdominal image pairs. The code associated with this project is available here https://github.com/MASILab/influence_of_fusion_on_pancreas_segmentation.</p>\",\"PeriodicalId\":47707,\"journal\":{\"name\":\"Journal of Medical Imaging\",\"volume\":\"12 2\",\"pages\":\"024008\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12032765/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Medical Imaging\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1117/1.JMI.12.2.024008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/4/26 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1117/1.JMI.12.2.024008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/26 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
Influence of early through late fusion on pancreas segmentation from imperfectly registered multimodal magnetic resonance imaging.
Purpose: Combining different types of medical imaging data, through multimodal fusion, promises better segmentation of anatomical structures, such as the pancreas. Strategic implementation of multimodal fusion could improve our ability to study diseases such as diabetes. However, where to perform fusion in deep learning models is still an open question. It is unclear if there is a single best location to fuse information when analyzing pairs of imperfectly aligned images or if the optimal fusion location depends on the specific model being used. Two main challenges when using multiple imaging modalities to study the pancreas are (1) the pancreas and surrounding abdominal anatomy have a deformable structure, making it difficult to consistently align the images and (2) breathing by the individual during image collection further complicates the alignment between multimodal images. Even after using state-of-the-art deformable image registration techniques, specifically designed to align abdominal images, multimodal images of the abdomen are often not perfectly aligned. We examine how the choice of different fusion points, ranging from early in the image processing pipeline to later stages, impacts the segmentation of the pancreas on imperfectly registered multimodal magnetic resonance (MR) images.
Approach: Our dataset consists of 353 pairs of T2-weighted (T2w) and T1-weighted (T1w) abdominal MR images from 163 subjects with accompanying pancreas segmentation labels drawn mainly based on the T2w images. Because the T2w images were acquired in an interleaved manner across two breath holds and the T1w images on one breath hold, there were three different breath holds impacting the alignment of each pair of images. We used deeds, a state-of-the-art deformable abdominal image registration method to align the image pairs. Then, we trained a collection of basic UNets with different fusion points, spanning from early to late layers in the model, to assess how early through late fusion influenced segmentation performance on imperfectly aligned images. To investigate whether performance differences on key fusion points are generalized to other architectures, we expanded our experiments to nnUNet.
Results: The single-modality T2w baseline using a basic UNet model had a median Dice score of 0.766, whereas the same baseline on the nnUNet model achieved 0.824. For each fusion approach, we analyzed the differences in performance with Dice residuals, by subtracting the baseline score from the fusion score for each datapoint. For the basic UNet, the best fusion approach was from early/mid fusion and occurred in the middle of the encoder with a median Dice residual of compared with the baseline. For the nnUNet, the best fusion approach was early fusion through naïve image concatenation before the model, with a median Dice residual of compared with the baseline. After Bonferroni correction, the distributions of the Dice scores for these best fusion approaches were found to be statistically significant ( ) via the paired Wilcoxon signed-rank test against the baseline.
Conclusions: Fusion in specific blocks can improve performance, but the best blocks for fusion are model-specific, and the gains are small. In imperfectly registered datasets, fusion is a nuanced problem, with the art of design remaining vital for uncovering potential insights. Future innovation is needed to better address fusion in cases of imperfect alignment of abdominal image pairs. The code associated with this project is available here https://github.com/MASILab/influence_of_fusion_on_pancreas_segmentation.
期刊介绍:
JMI covers fundamental and translational research, as well as applications, focused on medical imaging, which continue to yield physical and biomedical advancements in the early detection, diagnostics, and therapy of disease as well as in the understanding of normal. The scope of JMI includes: Imaging physics, Tomographic reconstruction algorithms (such as those in CT and MRI), Image processing and deep learning, Computer-aided diagnosis and quantitative image analysis, Visualization and modeling, Picture archiving and communications systems (PACS), Image perception and observer performance, Technology assessment, Ultrasonic imaging, Image-guided procedures, Digital pathology, Biomedical applications of biomedical imaging. JMI allows for the peer-reviewed communication and archiving of scientific developments, translational and clinical applications, reviews, and recommendations for the field.