Kuan-Chih Huang, Chang-En Lin, Donna Shu-Han Lin, Ting-Tse Lin, Cho-Kai Wu, Geng-Shi Jeng, Lian-Yu Lin, Lung-Chun Lin
{"title":"Video Transformer for Segmentation of Echocardiography Images in Myocardial Strain Measurement.","authors":"Kuan-Chih Huang, Chang-En Lin, Donna Shu-Han Lin, Ting-Tse Lin, Cho-Kai Wu, Geng-Shi Jeng, Lian-Yu Lin, Lung-Chun Lin","doi":"10.1007/s10278-025-01682-5","DOIUrl":null,"url":null,"abstract":"<p><p>The adoption of left ventricular global longitudinal strain (LVGLS) is still restricted by variability among various vendors and observers, despite advancements from tissue Doppler to speckle tracking imaging, machine learning, and, more recently, convolutional neural network (CNN)-based segmentation strain analysis. While CNNs have enabled fully automated strain measurement, they are inherently constrained by restricted receptive fields and a lack of temporal consistency. Transformer-based networks have emerged as a powerful alternative in medical imaging, offering enhanced global attention. Among these, the Video Swin Transformer (V-SwinT) architecture, with its 3D-shifted windows and locality inductive bias, is particularly well suited for ultrasound imaging, providing temporal consistency while optimizing computational efficiency. In this study, we propose the DTHR-SegStrain model based on a V-SwinT backbone. This model incorporates contour regression and utilizes an FCN-style multiscale feature fusion. As a result, it can generate accurate and temporally consistent left ventricle (LV) contours, allowing for direct calculation of myocardial strain without the need for conversion from segmentation to contours or any additional postprocessing. Compared to EchoNet-dynamic and Unity-GLS, DTHR-SegStrain showed greater efficiency, reliability, and validity in LVGLS measurements. Furthermore, the hybridization experiments assessed the interaction between segmentation models and strain algorithms, reinforcing that consistent segmentation contours over time can simplify strain calculations and decrease measurement variability. These findings emphasize the potential of V-SwinT-based frameworks to enhance the standardization and clinical applicability of LVGLS assessments.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of imaging informatics in medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10278-025-01682-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The adoption of left ventricular global longitudinal strain (LVGLS) is still restricted by variability among various vendors and observers, despite advancements from tissue Doppler to speckle tracking imaging, machine learning, and, more recently, convolutional neural network (CNN)-based segmentation strain analysis. While CNNs have enabled fully automated strain measurement, they are inherently constrained by restricted receptive fields and a lack of temporal consistency. Transformer-based networks have emerged as a powerful alternative in medical imaging, offering enhanced global attention. Among these, the Video Swin Transformer (V-SwinT) architecture, with its 3D-shifted windows and locality inductive bias, is particularly well suited for ultrasound imaging, providing temporal consistency while optimizing computational efficiency. In this study, we propose the DTHR-SegStrain model based on a V-SwinT backbone. This model incorporates contour regression and utilizes an FCN-style multiscale feature fusion. As a result, it can generate accurate and temporally consistent left ventricle (LV) contours, allowing for direct calculation of myocardial strain without the need for conversion from segmentation to contours or any additional postprocessing. Compared to EchoNet-dynamic and Unity-GLS, DTHR-SegStrain showed greater efficiency, reliability, and validity in LVGLS measurements. Furthermore, the hybridization experiments assessed the interaction between segmentation models and strain algorithms, reinforcing that consistent segmentation contours over time can simplify strain calculations and decrease measurement variability. These findings emphasize the potential of V-SwinT-based frameworks to enhance the standardization and clinical applicability of LVGLS assessments.