Analysis of Swin-UNet vision transformer for Inferior Vena Cava filter segmentation from CT scans

Artificial intelligence in the life sciences Pub Date : 2023-08-18 DOI:10.1016/j.ailsci.2023.100084

Rahul Gomes , Tyler Pham , Nichol He , Connor Kamrowski , Joseph Wildenberg

{"title":"Analysis of Swin-UNet vision transformer for Inferior Vena Cava filter segmentation from CT scans","authors":"Rahul Gomes , Tyler Pham , Nichol He , Connor Kamrowski , Joseph Wildenberg","doi":"10.1016/j.ailsci.2023.100084","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><p>The purpose of this study is to develop an accurate deep learning model capable of Inferior Vena Cava (IVC) filter segmentation from CT scans. The study does a comparative assessment of the impact of Residual Networks (ResNets) complemented with reduced convolutional layer depth and also analyzes the impact of using vision transformer architectures without performance degradation.</p></div><div><h3>Materials and Methods</h3><p>This experimental retrospective study on 84 CT scans consisting of 54618 slices involves design, implementation, and evaluation of segmentation algorithm which can be used to generate a clinical report for the presence of IVC filters on abdominal CT scans performed for any reason. Several variants of patch-based 3D-Convolutional Neural Network (CNN) and the Swin UNet Transformer (Swin-UNETR) are used to retrieve the signature of IVC filters. The Dice Score is used as a metric to compare the performance of the segmentation models.</p></div><div><h3>Results</h3><p>Model trained on UNet variant using four ResNet layers showed a higher segmentation performance achieving median Dice = 0.92 [Interquartile range(IQR): 0.85, 0.93] compared to the plain UNet model with four layers having median Dice = 0.89 [IQR: 0.83, 0.92]. Segmentation results from ResNet with two layers achieved a median Dice = 0.93 [IQR: 0.87, 0.94] which was higher than the plain UNet model with two layers at median Dice = 0.87 [IQR: 0.77, 0.90]. Models trained using SWIN-based transformers performed significantly better in both training and validation datasets compared to the four CNN variants. The validation median Dice was highest in 4 layer Swin UNETR at 0.88 followed by 2 layer Swin UNETR at 0.85.</p></div><div><h3>Conclusion</h3><p>Utilization of vision based transformer Swin-UNETR results in segmentation output with both low bias and variance thereby solving a real-world problem within healthcare for advanced Artificial Intelligence (AI) image processing and recognition. The Swin UNETR will reduce the time spent manually tracking IVC filters by centralizing within the electronic health record. Link to <span>GitHub</span><svg><path></path></svg> repository.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"4 ","pages":"Article 100084"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence in the life sciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667318523000284","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose

The purpose of this study is to develop an accurate deep learning model capable of Inferior Vena Cava (IVC) filter segmentation from CT scans. The study does a comparative assessment of the impact of Residual Networks (ResNets) complemented with reduced convolutional layer depth and also analyzes the impact of using vision transformer architectures without performance degradation.

Materials and Methods

This experimental retrospective study on 84 CT scans consisting of 54618 slices involves design, implementation, and evaluation of segmentation algorithm which can be used to generate a clinical report for the presence of IVC filters on abdominal CT scans performed for any reason. Several variants of patch-based 3D-Convolutional Neural Network (CNN) and the Swin UNet Transformer (Swin-UNETR) are used to retrieve the signature of IVC filters. The Dice Score is used as a metric to compare the performance of the segmentation models.

Results

Model trained on UNet variant using four ResNet layers showed a higher segmentation performance achieving median Dice = 0.92 [Interquartile range(IQR): 0.85, 0.93] compared to the plain UNet model with four layers having median Dice = 0.89 [IQR: 0.83, 0.92]. Segmentation results from ResNet with two layers achieved a median Dice = 0.93 [IQR: 0.87, 0.94] which was higher than the plain UNet model with two layers at median Dice = 0.87 [IQR: 0.77, 0.90]. Models trained using SWIN-based transformers performed significantly better in both training and validation datasets compared to the four CNN variants. The validation median Dice was highest in 4 layer Swin UNETR at 0.88 followed by 2 layer Swin UNETR at 0.85.

Conclusion

Utilization of vision based transformer Swin-UNETR results in segmentation output with both low bias and variance thereby solving a real-world problem within healthcare for advanced Artificial Intelligence (AI) image processing and recognition. The Swin UNETR will reduce the time spent manually tracking IVC filters by centralizing within the electronic health record. Link to GitHub repository.

Abstract Image

查看原文本刊更多论文

Swin-UNet视觉变换器用于下腔静脉CT滤波分割的分析

目的建立一种精确的深度学习模型，用于下腔静脉(IVC) CT图像的滤波分割。该研究对残差网络(ResNets)与减少卷积层深度相结合的影响进行了比较评估，并分析了在不降低性能的情况下使用视觉转换器架构的影响。材料和方法本实验回顾性研究了84个CT扫描，包括54618个切片，涉及分割算法的设计、实现和评估，该算法可用于生成临床报告，用于任何原因进行的腹部CT扫描中存在IVC过滤器。基于补丁的三维卷积神经网络(CNN)和Swin UNet变压器(swan - unetr)的几种变体被用于检索IVC滤波器的特征。Dice Score被用作比较分割模型性能的指标。结果使用4个ResNet层训练的UNet变体模型与使用4个ResNet层训练的UNet模型相比，具有更高的分割性能，达到中位数Dice = 0.92[四分位间距(IQR): 0.85, 0.93]，而普通UNet模型的中位数Dice = 0.89 [IQR: 0.83, 0.92]。ResNet两层分割结果的中位数Dice = 0.93 [IQR: 0.87, 0.94]，高于普通UNet两层模型的中位数Dice = 0.87 [IQR: 0.77, 0.90]。与四种CNN变体相比，使用基于swn的变压器训练的模型在训练和验证数据集中的表现都要好得多。4层Swin UNETR的验证中位数骰子最高，为0.88，其次是2层Swin UNETR，为0.85。结论使用基于视觉的swun - unetr变压器可以获得低偏差和方差的分割输出，从而解决了先进人工智能(AI)图像处理和识别在医疗保健中的现实问题。Swin UNETR将通过集中在电子健康记录内减少人工跟踪IVC过滤器所花费的时间。链接到GitHub仓库。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Artificial intelligence in the life sciences Pharmacology, Biochemistry, Genetics and Molecular Biology (General), Computer Science Applications, Health Informatics, Drug Discovery, Veterinary Science and Veterinary Medicine (General)

CiteScore

5.00

自引率

0.00%

发文量

审稿时长

15 days