PLZero: placeholder based approach to generalized zero-shot learning for multi-label recognition in chest radiographs

IF 5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems Pub Date : 2025-01-02 DOI:10.1007/s40747-024-01717-4

Chengrong Yang, Qiwen Jin, Fei Du, Jing Guo, Yujue Zhou

{"title":"PLZero: placeholder based approach to generalized zero-shot learning for multi-label recognition in chest radiographs","authors":"Chengrong Yang, Qiwen Jin, Fei Du, Jing Guo, Yujue Zhou","doi":"10.1007/s40747-024-01717-4","DOIUrl":null,"url":null,"abstract":"<p>By leveraging large-scale image-text paired data for pre-training, the model can efficiently learn the alignment between images and text, significantly advancing the development of zero-shot learning (ZSL) in the field of intelligent medical image analysis. However, the heterogeneity between cross-modalities, false negatives in image-text pairs, and domain shift phenomena pose challenges, making it difficult for existing methods to effectively learn the deep semantic relationships between images and text. To address these challenges, we propose a multi-label chest X-ray recognition generalized ZSL framework based on placeholder learning, termed PLZero. Specifically, we first introduce a jointed embedding space learning module (JESL) to encourage the model to better capture the diversity among different labels. Secondly, we propose a hallucinated class generation module (HCG), which generates hallucinated classes by feature diffusion and feature fusion based on the visual and semantic features of seen classes, using these hallucinated classes as placeholders for unseen classes. Finally, we propose a hallucinated class-based prototype learning module (HCPL), which leverages contrastive learning to control the distribution of hallucinated classes around seen classes without significant deviation from the original data, encouraging high dispersion of class prototypes for seen classes to create sufficient space for inserting unseen class samples. Extensive experiments demonstrate that our method exhibits sufficient generalization and achieves the best performance across three classic and challenging chest X-ray datasets: NIH Chest X-ray 14, CheXpert, and ChestX-Det10. Notably, our method outperforms others even when the number of unseen classes exceeds the experimental settings of other methods. The codes are available at: https://github.com/jinqiwen/PLZero.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"27 1","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-024-01717-4","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

By leveraging large-scale image-text paired data for pre-training, the model can efficiently learn the alignment between images and text, significantly advancing the development of zero-shot learning (ZSL) in the field of intelligent medical image analysis. However, the heterogeneity between cross-modalities, false negatives in image-text pairs, and domain shift phenomena pose challenges, making it difficult for existing methods to effectively learn the deep semantic relationships between images and text. To address these challenges, we propose a multi-label chest X-ray recognition generalized ZSL framework based on placeholder learning, termed PLZero. Specifically, we first introduce a jointed embedding space learning module (JESL) to encourage the model to better capture the diversity among different labels. Secondly, we propose a hallucinated class generation module (HCG), which generates hallucinated classes by feature diffusion and feature fusion based on the visual and semantic features of seen classes, using these hallucinated classes as placeholders for unseen classes. Finally, we propose a hallucinated class-based prototype learning module (HCPL), which leverages contrastive learning to control the distribution of hallucinated classes around seen classes without significant deviation from the original data, encouraging high dispersion of class prototypes for seen classes to create sufficient space for inserting unseen class samples. Extensive experiments demonstrate that our method exhibits sufficient generalization and achieves the best performance across three classic and challenging chest X-ray datasets: NIH Chest X-ray 14, CheXpert, and ChestX-Det10. Notably, our method outperforms others even when the number of unseen classes exceeds the experimental settings of other methods. The codes are available at: https://github.com/jinqiwen/PLZero.

查看原文本刊更多论文

PLZero：基于占位符的胸片多标签识别广义零学习方法

通过利用大规模的图像-文本配对数据进行预训练，该模型可以高效地学习图像和文本之间的对齐，极大地推动了零射击学习（zero-shot learning， ZSL）在智能医学图像分析领域的发展。然而，跨模态之间的异质性、图像-文本对的假阴性和领域转移现象给现有方法带来了挑战，使其难以有效地学习图像和文本之间的深层语义关系。为了解决这些挑战，我们提出了一个基于占位符学习的多标签胸部x射线识别广义ZSL框架，称为PLZero。具体来说，我们首先引入了一个联合嵌入空间学习模块（JESL），以鼓励模型更好地捕获不同标签之间的多样性。其次，我们提出了一个幻觉类生成模块（HCG），该模块基于可见类的视觉和语义特征，通过特征扩散和特征融合生成幻觉类，并将这些幻觉类作为未见类的占位符。最后，我们提出了一个基于幻觉类的原型学习模块（HCPL），它利用对比学习来控制幻觉类在视觉类周围的分布，而不会明显偏离原始数据，鼓励视觉类的类原型高度分散，从而为插入未见类样本创造足够的空间。广泛的实验表明，我们的方法具有足够的泛化性，并在三个经典和具有挑战性的胸部x射线数据集（NIH chest X-ray 14， CheXpert和ChestX-Det10）中实现了最佳性能。值得注意的是，即使未见类的数量超过其他方法的实验设置，我们的方法也优于其他方法。代码可在https://github.com/jinqiwen/PLZero上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Complex & Intelligent Systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

9.60

自引率

10.30%

发文量

297

期刊介绍： Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.