Representing Part-Whole Hierarchies in Foundation Models by Learning Localizability, Composability, and Decomposability from Anatomy via Self-Supervision.

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Pub Date : 2024-06-01 Epub Date: 2024-09-16 DOI:10.1109/cvpr52733.2024.01071

Mohammad Reza Hosseinzadeh Taher, Michael B Gotway, Jianming Liang

{"title":"Representing Part-Whole Hierarchies in Foundation Models by Learning Localizability, Composability, and Decomposability from Anatomy via Self-Supervision.","authors":"Mohammad Reza Hosseinzadeh Taher, Michael B Gotway, Jianming Liang","doi":"10.1109/cvpr52733.2024.01071","DOIUrl":null,"url":null,"abstract":"<p><p>Humans effortlessly interpret images by parsing them into part-whole hierarchies; deep learning excels in learning multi-level feature spaces, but they often lack explicit coding of part-whole relations, a prominent property of medical imaging. To overcome this limitation, we introduce Adam-v2, a new self-supervised learning framework extending Adam [79] by explicitly incorporating part-whole hierarchies into its learning objectives through three key branches: (1) Localizability, acquiring discriminative representations to distinguish different anatomical patterns; (2) Composability, learning each anatomical structure in a parts-to-whole manner; and (3) Decomposability, comprehending each anatomical structure in a whole-to-parts manner. Experimental results across 10 tasks, compared to 11 baselines in zero-shot, few-shot transfer, and full fine-tuning settings, showcase Adam-v2's superior performance over large-scale medical models and existing SSL methods across diverse downstream tasks. The higher generality and robustness of Adam-v2's representations originate from its explicit construction of hierarchies for distinct anatomical structures from unlabeled medical images. Adam-v2 preserves a semantic balance of anatomical diversity and harmony in its embedding, yielding representations that are both generic and semantically meaningful, yet overlooked in existing SSL methods. All code and pretrained models are available at GitHub.com/JLiangLab/Eden.</p>","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"abs/210504906 2024","pages":"11269-11281"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11636527/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/cvpr52733.2024.01071","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/16 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Humans effortlessly interpret images by parsing them into part-whole hierarchies; deep learning excels in learning multi-level feature spaces, but they often lack explicit coding of part-whole relations, a prominent property of medical imaging. To overcome this limitation, we introduce Adam-v2, a new self-supervised learning framework extending Adam [79] by explicitly incorporating part-whole hierarchies into its learning objectives through three key branches: (1) Localizability, acquiring discriminative representations to distinguish different anatomical patterns; (2) Composability, learning each anatomical structure in a parts-to-whole manner; and (3) Decomposability, comprehending each anatomical structure in a whole-to-parts manner. Experimental results across 10 tasks, compared to 11 baselines in zero-shot, few-shot transfer, and full fine-tuning settings, showcase Adam-v2's superior performance over large-scale medical models and existing SSL methods across diverse downstream tasks. The higher generality and robustness of Adam-v2's representations originate from its explicit construction of hierarchies for distinct anatomical structures from unlabeled medical images. Adam-v2 preserves a semantic balance of anatomical diversity and harmony in its embedding, yielding representations that are both generic and semantically meaningful, yet overlooked in existing SSL methods. All code and pretrained models are available at GitHub.com/JLiangLab/Eden.

查看原文本刊更多论文

通过自我监督从解剖学中学习可定位性、可组合性和可分解性来表示基础模型中的部分-整体层次。

人类通过将图像解析成“部分-整体”的层次结构，毫不费力地解读图像；深度学习在学习多层次特征空间方面表现出色，但它们往往缺乏对部分-整体关系的明确编码，这是医学成像的一个突出特性。为了克服这一限制，我们引入了Adam-v2，这是一个新的自监督学习框架，它扩展了Adam[79]，通过三个关键分支明确地将部分-整体层次结构纳入其学习目标中：(1)本地化，获取区分不同解剖模式的判别表征；(2)可组合性，以部分到整体的方式学习每个解剖结构；(3)可分解性，以整体到局部的方式理解每个解剖结构。在10个任务中的实验结果，与零射击、少射击转移和完全微调设置的11个基线相比，显示了Adam-v2在不同下游任务中的大规模医学模型和现有SSL方法的优越性能。Adam-v2的表示具有更高的通用性和鲁棒性，这源于它对未标记的医学图像中不同解剖结构的层次结构的明确构建。Adam-v2在其嵌入中保持了解剖多样性和和谐的语义平衡，产生了既通用又有语义意义的表示，但在现有的SSL方法中却被忽视了。所有代码和预训练模型都可以在GitHub.com/JLiangLab/Eden上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

CiteScore

43.50

自引率

0.00%

发文量