Automatic segmentation of echocardiographic images using a shifted windows vision transformer architecture.

IF 1.3 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Biomedical Physics & Engineering Express Pub Date : 2024-09-13 DOI:10.1088/2057-1976/ad7594

Souha Nemri, Luc Duong

{"title":"Automatic segmentation of echocardiographic images using a shifted windows vision transformer architecture.","authors":"Souha Nemri, Luc Duong","doi":"10.1088/2057-1976/ad7594","DOIUrl":null,"url":null,"abstract":"<p><p>Echocardiography is one the most commonly used imaging modalities for the diagnosis of congenital heart disease. Echocardiographic image analysis is crucial to obtaining accurate cardiac anatomy information. Semantic segmentation models can be used to precisely delimit the borders of the left ventricle, and allow an accurate and automatic identification of the region of interest, which can be extremely useful for cardiologists. In the field of computer vision, convolutional neural network (CNN) architectures remain dominant. Existing CNN approaches have proved highly efficient for the segmentation of various medical images over the past decade. However, these solutions usually struggle to capture long-range dependencies, especially when it comes to images with objects of different scales and complex structures. In this study, we present an efficient method for semantic segmentation of echocardiographic images that overcomes these challenges by leveraging the self-attention mechanism of the Transformer architecture. The proposed solution extracts long-range dependencies and efficiently processes objects at different scales, improving performance in a variety of tasks. We introduce Shifted Windows Transformer models (Swin Transformers), which encode both the content of anatomical structures and the relationship between them. Our solution combines the Swin Transformer and U-Net architectures, producing a U-shaped variant. The validation of the proposed method is performed with the EchoNet-Dynamic dataset used to train our model. The results show an accuracy of 0.97, a Dice coefficient of 0.87, and an Intersection over union (IoU) of 0.78. Swin Transformer models are promising for semantically segmenting echocardiographic images and may help assist cardiologists in automatically analyzing and measuring complex echocardiographic images.</p>","PeriodicalId":8896,"journal":{"name":"Biomedical Physics & Engineering Express","volume":" ","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Physics & Engineering Express","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/2057-1976/ad7594","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Echocardiography is one the most commonly used imaging modalities for the diagnosis of congenital heart disease. Echocardiographic image analysis is crucial to obtaining accurate cardiac anatomy information. Semantic segmentation models can be used to precisely delimit the borders of the left ventricle, and allow an accurate and automatic identification of the region of interest, which can be extremely useful for cardiologists. In the field of computer vision, convolutional neural network (CNN) architectures remain dominant. Existing CNN approaches have proved highly efficient for the segmentation of various medical images over the past decade. However, these solutions usually struggle to capture long-range dependencies, especially when it comes to images with objects of different scales and complex structures. In this study, we present an efficient method for semantic segmentation of echocardiographic images that overcomes these challenges by leveraging the self-attention mechanism of the Transformer architecture. The proposed solution extracts long-range dependencies and efficiently processes objects at different scales, improving performance in a variety of tasks. We introduce Shifted Windows Transformer models (Swin Transformers), which encode both the content of anatomical structures and the relationship between them. Our solution combines the Swin Transformer and U-Net architectures, producing a U-shaped variant. The validation of the proposed method is performed with the EchoNet-Dynamic dataset used to train our model. The results show an accuracy of 0.97, a Dice coefficient of 0.87, and an Intersection over union (IoU) of 0.78. Swin Transformer models are promising for semantically segmenting echocardiographic images and may help assist cardiologists in automatically analyzing and measuring complex echocardiographic images.

查看原文本刊更多论文

使用移位视窗视觉变换器架构自动分割超声心动图。

超声心动图是诊断先天性心脏病最常用的成像方式之一。超声心动图图像分析对于获得准确的心脏解剖信息至关重要。语义分割模型可用于精确划分左心室的边界，并能准确和自动识别感兴趣区，这对心脏病专家来说非常有用。在计算机视觉领域，卷积神经网络（CNN）架构仍占主导地位。在过去十年中，现有的卷积神经网络方法已被证明能高效地分割各种医学图像。然而，这些解决方案通常难以捕捉长距离依赖关系，尤其是当涉及到具有不同尺度和复杂结构的物体的图像时。在本研究中，我们提出了一种用于超声心动图图像语义分割的高效方法，该方法利用变形器架构的自我关注机制克服了这些挑战。所提出的解决方案可以提取长距离依赖关系，并高效处理不同尺度的对象，从而提高各种任务的性能。我们引入了移位窗口变换器模型（Swin Transformer），它既能编码解剖结构的内容，也能编码它们之间的关系。我们使用用于训练模型的 EchoNet-Dynamic 数据集对所提出的方法进行了验证。结果表明，该方法的准确率为 0.97，Dice 系数为 0.87，交集大于联合（Intersection over union，IoU）为 0.78。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biomedical Physics & Engineering Express RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-

CiteScore

2.80

自引率

0.00%

发文量

153

期刊介绍： BPEX is an inclusive, international, multidisciplinary journal devoted to publishing new research on any application of physics and/or engineering in medicine and/or biology. Characterized by a broad geographical coverage and a fast-track peer-review process, relevant topics include all aspects of biophysics, medical physics and biomedical engineering. Papers that are almost entirely clinical or biological in their focus are not suitable. The journal has an emphasis on publishing interdisciplinary work and bringing research fields together, encompassing experimental, theoretical and computational work.