基于体融合的三维医学图像分割自监督预训练

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-09-22 DOI:10.1109/TIP.2025.3610249

Guotai Wang;Jia Fu;Jianghao Wu;Xiangde Luo;Yubo Zhou;Xinglong Liu;Kang Li;Jingsheng Lin;Baiyong Shen;Shaoting Zhang

{"title":"基于体融合的三维医学图像分割自监督预训练","authors":"Guotai Wang;Jia Fu;Jianghao Wu;Xiangde Luo;Yubo Zhou;Xinglong Liu;Kang Li;Jingsheng Lin;Baiyong Shen;Shaoting Zhang","doi":"10.1109/TIP.2025.3610249","DOIUrl":null,"url":null,"abstract":"The performance of deep learning models for medical image segmentation is often limited in scenarios where training data or annotations are limited. Self-Supervised Learning (SSL) is an appealing solution for this dilemma due to its feature learning ability from a large amount of unannotated images. Existing SSL methods have focused on pretraining either an encoder for global feature representation or an encoder-decoder structure for image restoration, where the gap between pretext and downstream tasks limits the usefulness of pretrained decoders in downstream segmentation. In this work, we propose a novel SSL strategy named Volume Fusion (VolF) for pretraining 3D segmentation models. It minimizes the gap between pretext and downstream tasks by introducing a pseudo-segmentation pretext task, where two sub-volumes are fused by a discretized block-wise fusion coefficient map. The model takes the fused result as input and predicts the category of fusion coefficient for each voxel, which can be trained with standard supervised segmentation loss functions without manual annotations. Experiments with an abdominal CT dataset for pretraining and both in-domain and out-domain downstream datasets showed that VolF led to large performance gain from training from scratch with faster convergence speed, and outperformed several state-of-the-art SSL methods. In addition, it is general to different network structures, and the learned features have high generalizability to different body parts and modalities.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"6041-6052"},"PeriodicalIF":13.7000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Volume Fusion-Based Self-Supervised Pretraining for 3D Medical Image Segmentation\",\"authors\":\"Guotai Wang;Jia Fu;Jianghao Wu;Xiangde Luo;Yubo Zhou;Xinglong Liu;Kang Li;Jingsheng Lin;Baiyong Shen;Shaoting Zhang\",\"doi\":\"10.1109/TIP.2025.3610249\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The performance of deep learning models for medical image segmentation is often limited in scenarios where training data or annotations are limited. Self-Supervised Learning (SSL) is an appealing solution for this dilemma due to its feature learning ability from a large amount of unannotated images. Existing SSL methods have focused on pretraining either an encoder for global feature representation or an encoder-decoder structure for image restoration, where the gap between pretext and downstream tasks limits the usefulness of pretrained decoders in downstream segmentation. In this work, we propose a novel SSL strategy named Volume Fusion (VolF) for pretraining 3D segmentation models. It minimizes the gap between pretext and downstream tasks by introducing a pseudo-segmentation pretext task, where two sub-volumes are fused by a discretized block-wise fusion coefficient map. The model takes the fused result as input and predicts the category of fusion coefficient for each voxel, which can be trained with standard supervised segmentation loss functions without manual annotations. Experiments with an abdominal CT dataset for pretraining and both in-domain and out-domain downstream datasets showed that VolF led to large performance gain from training from scratch with faster convergence speed, and outperformed several state-of-the-art SSL methods. In addition, it is general to different network structures, and the learned features have high generalizability to different body parts and modalities.\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"34 \",\"pages\":\"6041-6052\"},\"PeriodicalIF\":13.7000,\"publicationDate\":\"2025-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11175343/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11175343/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在训练数据或注释有限的情况下，医学图像分割的深度学习模型的性能往往受到限制。自监督学习（Self-Supervised Learning， SSL）能够从大量未注释的图像中学习特征，是解决这一难题的一个很有吸引力的解决方案。现有的SSL方法侧重于预训练用于全局特征表示的编码器或用于图像恢复的编码器-解码器结构，其中前导和下游任务之间的差距限制了预训练的解码器在下游分割中的有用性。在这项工作中，我们提出了一种新的SSL策略，称为体积融合（VolF），用于预训练3D分割模型。它通过引入伪分割借口任务来最小化借口和下游任务之间的差距，其中两个子卷通过离散的块方向融合系数图进行融合。该模型以融合结果为输入，预测每个体素的融合系数类别，可以使用标准的监督分割损失函数进行训练，无需人工标注。使用腹部CT数据集进行预训练以及域内和域外下游数据集的实验表明，VolF可以从从头开始训练中获得较大的性能增益，并且收敛速度更快，并且优于几种最先进的SSL方法。此外，它对不同的网络结构具有通用性，学习到的特征对不同的身体部位和形态具有高度的泛化性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Volume Fusion-Based Self-Supervised Pretraining for 3D Medical Image Segmentation

The performance of deep learning models for medical image segmentation is often limited in scenarios where training data or annotations are limited. Self-Supervised Learning (SSL) is an appealing solution for this dilemma due to its feature learning ability from a large amount of unannotated images. Existing SSL methods have focused on pretraining either an encoder for global feature representation or an encoder-decoder structure for image restoration, where the gap between pretext and downstream tasks limits the usefulness of pretrained decoders in downstream segmentation. In this work, we propose a novel SSL strategy named Volume Fusion (VolF) for pretraining 3D segmentation models. It minimizes the gap between pretext and downstream tasks by introducing a pseudo-segmentation pretext task, where two sub-volumes are fused by a discretized block-wise fusion coefficient map. The model takes the fused result as input and predicts the category of fusion coefficient for each voxel, which can be trained with standard supervised segmentation loss functions without manual annotations. Experiments with an abdominal CT dataset for pretraining and both in-domain and out-domain downstream datasets showed that VolF led to large performance gain from training from scratch with faster convergence speed, and outperformed several state-of-the-art SSL methods. In addition, it is general to different network structures, and the learned features have high generalizability to different body parts and modalities.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量