Hierarchical Contrastive Learning for Precise Whole-body Anatomical Localization in PET/CT Imaging.

IEEE transactions on medical imaging Pub Date : 2025-08-18 DOI:10.1109/TMI.2025.3599197

Yaozong Gao, Yiran Shu, Mingyang Yu, Yanbo Chen, Jingyu Liu, Shaonan Zhong, Weifang Zhang, Yiqiang Zhan, Xiang Sean Zhou, Xinlu Wang, Meixin Zhao, Dinggang Shen

{"title":"Hierarchical Contrastive Learning for Precise Whole-body Anatomical Localization in PET/CT Imaging.","authors":"Yaozong Gao, Yiran Shu, Mingyang Yu, Yanbo Chen, Jingyu Liu, Shaonan Zhong, Weifang Zhang, Yiqiang Zhan, Xiang Sean Zhou, Xinlu Wang, Meixin Zhao, Dinggang Shen","doi":"10.1109/TMI.2025.3599197","DOIUrl":null,"url":null,"abstract":"<p><p>Automatic anatomical localization is critical for radiology report generation. While many studies focus on lesion detection and segmentation, anatomical localization-accurately describing lesion positions in radiology reports-has received less attention. Conventional segmentation-based methods are limited to organ-level localization and often fail in severe disease cases due to low segmentation accuracy. To address these limitations, we reformulate anatomical localization as an image-to-text retrieval task. Specifically, we propose a CLIP-based framework that aligns lesion image patches with anatomically descriptive text embeddings in a shared multimodal space. By projecting lesion features into the semantic space and retrieving the most relevant anatomical descriptions in a coarse-to-fine manner, our method achieves fine-grained lesion localization with high accuracy across the entire body. Our main contributions are as follows: (1) hierarchical anatomical retrieval, which organizes 387 locations into a two-level hierarchy, by retrieving from the first level of 124 coarse categories to narrow down the search space and reduce localization complexity; (2) augmented location descriptions, which integrate domain-specific anatomical knowledge for enhancing semantic representation and improving visual-text alignment; and (3) semi-hard negative sample mining, which improves training stability and discriminative learning by avoiding selecting the overly similar negative samples that may introduce label noise or semantic ambiguity. We validate our method on two whole-body PET/CT datasets, achieving an 84.13% localization accuracy on the internal test set and 80.42% on the external test set, with a per-lesion inference time of 34 ms. The proposed framework also demonstrated superior robustness in complex clinical cases compared to segmentation-based approaches.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on medical imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TMI.2025.3599197","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Automatic anatomical localization is critical for radiology report generation. While many studies focus on lesion detection and segmentation, anatomical localization-accurately describing lesion positions in radiology reports-has received less attention. Conventional segmentation-based methods are limited to organ-level localization and often fail in severe disease cases due to low segmentation accuracy. To address these limitations, we reformulate anatomical localization as an image-to-text retrieval task. Specifically, we propose a CLIP-based framework that aligns lesion image patches with anatomically descriptive text embeddings in a shared multimodal space. By projecting lesion features into the semantic space and retrieving the most relevant anatomical descriptions in a coarse-to-fine manner, our method achieves fine-grained lesion localization with high accuracy across the entire body. Our main contributions are as follows: (1) hierarchical anatomical retrieval, which organizes 387 locations into a two-level hierarchy, by retrieving from the first level of 124 coarse categories to narrow down the search space and reduce localization complexity; (2) augmented location descriptions, which integrate domain-specific anatomical knowledge for enhancing semantic representation and improving visual-text alignment; and (3) semi-hard negative sample mining, which improves training stability and discriminative learning by avoiding selecting the overly similar negative samples that may introduce label noise or semantic ambiguity. We validate our method on two whole-body PET/CT datasets, achieving an 84.13% localization accuracy on the internal test set and 80.42% on the external test set, with a per-lesion inference time of 34 ms. The proposed framework also demonstrated superior robustness in complex clinical cases compared to segmentation-based approaches.

查看原文本刊更多论文

分层对比学习在PET/CT成像中的精确全身解剖定位。

自动解剖定位是生成放射学报告的关键。虽然许多研究都集中在病灶的检测和分割上，但解剖学定位——在放射学报告中准确描述病灶的位置——却很少受到关注。传统的基于分割的方法仅限于器官水平的定位，并且由于分割精度低，在严重的疾病病例中往往失败。为了解决这些限制，我们将解剖定位重新定义为图像到文本的检索任务。具体来说，我们提出了一个基于clip的框架，该框架将病变图像斑块与共享多模态空间中的解剖学描述性文本嵌入对齐。通过将病灶特征投影到语义空间中，并以粗到细的方式检索最相关的解剖描述，我们的方法实现了高精度的全身细粒度病灶定位。我们的主要贡献如下：(1)分层解剖检索，通过从第一级124个粗分类中检索，将387个位置组织成两个层次，缩小了搜索空间，降低了定位复杂度；(2)增强位置描述，整合特定领域的解剖学知识，增强语义表示，改善视觉-文本对齐；(3)半硬负样本挖掘，通过避免选择可能引入标签噪声或语义模糊的过于相似的负样本，提高训练稳定性和判别学习。我们在两个全身PET/CT数据集上验证了我们的方法，在内部测试集上实现了84.13%的定位精度，在外部测试集上实现了80.42%的定位精度，每个病变的推断时间为34 ms。与基于分段的方法相比，所提出的框架在复杂的临床病例中也表现出优越的稳健性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on medical imaging

自引率

0.00%

发文量