DSTAN:双向序列特征细化的可变形时空注意力网络,用于甲状腺超声波视频中斑点噪声的去除。

Jianning Chi, Jian Miao, Jia-Hui Chen, Huan Wang, Xiaosheng Yu, Ying Huang
{"title":"DSTAN:双向序列特征细化的可变形时空注意力网络,用于甲状腺超声波视频中斑点噪声的去除。","authors":"Jianning Chi, Jian Miao, Jia-Hui Chen, Huan Wang, Xiaosheng Yu, Ying Huang","doi":"10.1007/s10278-023-00935-5","DOIUrl":null,"url":null,"abstract":"<p><p>Thyroid ultrasound video provides significant value for thyroid diseases diagnosis, but the ultrasound imaging process is often affected by the speckle noise, resulting in poor quality of the ultrasound video. Numerous video denoising methods have been proposed to remove noise while preserving texture details. However, existing methods still suffer from the following problems: (1) relevant temporal features in the low-contrast ultrasound video cannot be accurately aligned and effectively aggregated by simple optical flow or motion estimation, resulting in the artifacts and motion blur in the video; (2) fixed receptive field in spatial features integration lacks the flexibility of aggregating features in the global region of interest and is susceptible to interference from irrelevant noisy regions. In this work, we propose a deformable spatial-temporal attention denoising network to remove speckle noise in thyroid ultrasound video. The entire network follows the bidirectional feature propagation mechanism to efficiently exploit the spatial-temporal information of the whole video sequence. In this process, two modules are proposed to address the above problems: (1) a deformable temporal attention module (DTAM) is designed after optical flow pre-alignment to further capture and aggregate relevant temporal features according to the learned offsets between frames, so that inter-frame information can be better exploited even with the imprecise flow estimation under the low contrast of ultrasound video; (2) a deformable spatial attention module (DSAM) is proposed to flexibly integrate spatial features in the global region of interest through the learned intra-frame offsets, so that irrelevant noisy information can be ignored and essential information can be precisely exploited. Finally, all these refined features are rectified and merged through residual convolution blocks to recover the clean video frames. Experimental results on our thyroid ultrasound video (US-V) dataset and the DDTI dataset demonstrate that our proposed method exceeds 1.2 <math><mo>∼</mo></math> 1.3 dB on PSNR and has clearer texture detail compared to other state-of-the-art methods. In the meantime, the proposed model can also assist thyroid nodule segmentation methods to achieve more accurate segmentation effect, which provides an important basis for thyroid diagnosis. In the future, the proposed model can be improved and extended to other medical image sequence datasets, including CT and MRI slice denoising. The code and datasets are provided at https://github.com/Meta-MJ/DSTAN .</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":"3264-3281"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11612081/pdf/","citationCount":"0","resultStr":"{\"title\":\"DSTAN: A Deformable Spatial-temporal Attention Network with Bidirectional Sequence Feature Refinement for Speckle Noise Removal in Thyroid Ultrasound Video.\",\"authors\":\"Jianning Chi, Jian Miao, Jia-Hui Chen, Huan Wang, Xiaosheng Yu, Ying Huang\",\"doi\":\"10.1007/s10278-023-00935-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Thyroid ultrasound video provides significant value for thyroid diseases diagnosis, but the ultrasound imaging process is often affected by the speckle noise, resulting in poor quality of the ultrasound video. Numerous video denoising methods have been proposed to remove noise while preserving texture details. However, existing methods still suffer from the following problems: (1) relevant temporal features in the low-contrast ultrasound video cannot be accurately aligned and effectively aggregated by simple optical flow or motion estimation, resulting in the artifacts and motion blur in the video; (2) fixed receptive field in spatial features integration lacks the flexibility of aggregating features in the global region of interest and is susceptible to interference from irrelevant noisy regions. In this work, we propose a deformable spatial-temporal attention denoising network to remove speckle noise in thyroid ultrasound video. The entire network follows the bidirectional feature propagation mechanism to efficiently exploit the spatial-temporal information of the whole video sequence. In this process, two modules are proposed to address the above problems: (1) a deformable temporal attention module (DTAM) is designed after optical flow pre-alignment to further capture and aggregate relevant temporal features according to the learned offsets between frames, so that inter-frame information can be better exploited even with the imprecise flow estimation under the low contrast of ultrasound video; (2) a deformable spatial attention module (DSAM) is proposed to flexibly integrate spatial features in the global region of interest through the learned intra-frame offsets, so that irrelevant noisy information can be ignored and essential information can be precisely exploited. Finally, all these refined features are rectified and merged through residual convolution blocks to recover the clean video frames. Experimental results on our thyroid ultrasound video (US-V) dataset and the DDTI dataset demonstrate that our proposed method exceeds 1.2 <math><mo>∼</mo></math> 1.3 dB on PSNR and has clearer texture detail compared to other state-of-the-art methods. In the meantime, the proposed model can also assist thyroid nodule segmentation methods to achieve more accurate segmentation effect, which provides an important basis for thyroid diagnosis. In the future, the proposed model can be improved and extended to other medical image sequence datasets, including CT and MRI slice denoising. The code and datasets are provided at https://github.com/Meta-MJ/DSTAN .</p>\",\"PeriodicalId\":516858,\"journal\":{\"name\":\"Journal of imaging informatics in medicine\",\"volume\":\" \",\"pages\":\"3264-3281\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11612081/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of imaging informatics in medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s10278-023-00935-5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/6/5 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of imaging informatics in medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10278-023-00935-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/5 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

甲状腺超声视频为甲状腺疾病诊断提供了重要价值,但超声成像过程往往受到斑点噪声的影响,导致超声视频质量不佳。为了在去除噪声的同时保留纹理细节,人们提出了许多视频去噪方法。然而,现有方法仍存在以下问题:(1)低对比度超声视频中的相关时间特征无法通过简单的光流或运动估计进行精确对齐和有效聚合,导致视频中出现伪影和运动模糊;(2)空间特征整合中的固定感受野缺乏聚合全局感兴趣区域特征的灵活性,容易受到无关噪声区域的干扰。在这项工作中,我们提出了一种可变形的时空注意力去噪网络来去除甲状腺超声视频中的斑点噪声。整个网络遵循双向特征传播机制,有效利用整个视频序列的时空信息。在此过程中,提出了两个模块来解决上述问题:(1) 在光流预对齐后设计了一个可变形时空关注模块(DTAM),根据学习到的帧间偏移进一步捕获和聚合相关时空特征,从而在超声视频对比度较低的情况下,即使对流量估计不精确,也能更好地利用帧间信息;(2)提出了可变形空间注意力模块(DSAM),通过学习到的帧内偏移,灵活地整合全局感兴趣区域的空间特征,从而忽略无关的噪声信息,精确地利用重要信息。最后,通过残差卷积块对所有这些细化特征进行整顿和合并,从而恢复干净的视频帧。在甲状腺超声视频(US-V)数据集和 DDTI 数据集上的实验结果表明,我们提出的方法在 PSNR 上超过了 1.2 ∼ 1.3 dB,与其他最先进的方法相比,纹理细节更加清晰。同时,提出的模型还能辅助甲状腺结节分割方法达到更精确的分割效果,为甲状腺诊断提供重要依据。未来,该模型还可以改进并扩展到其他医学图像序列数据集,包括 CT 和 MRI 切片去噪。代码和数据集见 https://github.com/Meta-MJ/DSTAN 。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

DSTAN: A Deformable Spatial-temporal Attention Network with Bidirectional Sequence Feature Refinement for Speckle Noise Removal in Thyroid Ultrasound Video.

DSTAN: A Deformable Spatial-temporal Attention Network with Bidirectional Sequence Feature Refinement for Speckle Noise Removal in Thyroid Ultrasound Video.

Thyroid ultrasound video provides significant value for thyroid diseases diagnosis, but the ultrasound imaging process is often affected by the speckle noise, resulting in poor quality of the ultrasound video. Numerous video denoising methods have been proposed to remove noise while preserving texture details. However, existing methods still suffer from the following problems: (1) relevant temporal features in the low-contrast ultrasound video cannot be accurately aligned and effectively aggregated by simple optical flow or motion estimation, resulting in the artifacts and motion blur in the video; (2) fixed receptive field in spatial features integration lacks the flexibility of aggregating features in the global region of interest and is susceptible to interference from irrelevant noisy regions. In this work, we propose a deformable spatial-temporal attention denoising network to remove speckle noise in thyroid ultrasound video. The entire network follows the bidirectional feature propagation mechanism to efficiently exploit the spatial-temporal information of the whole video sequence. In this process, two modules are proposed to address the above problems: (1) a deformable temporal attention module (DTAM) is designed after optical flow pre-alignment to further capture and aggregate relevant temporal features according to the learned offsets between frames, so that inter-frame information can be better exploited even with the imprecise flow estimation under the low contrast of ultrasound video; (2) a deformable spatial attention module (DSAM) is proposed to flexibly integrate spatial features in the global region of interest through the learned intra-frame offsets, so that irrelevant noisy information can be ignored and essential information can be precisely exploited. Finally, all these refined features are rectified and merged through residual convolution blocks to recover the clean video frames. Experimental results on our thyroid ultrasound video (US-V) dataset and the DDTI dataset demonstrate that our proposed method exceeds 1.2 1.3 dB on PSNR and has clearer texture detail compared to other state-of-the-art methods. In the meantime, the proposed model can also assist thyroid nodule segmentation methods to achieve more accurate segmentation effect, which provides an important basis for thyroid diagnosis. In the future, the proposed model can be improved and extended to other medical image sequence datasets, including CT and MRI slice denoising. The code and datasets are provided at https://github.com/Meta-MJ/DSTAN .

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信