从多个基础模型中提炼知识，用于零镜头图像分类。

IF 2.9 3区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

PLoS ONE Pub Date : 2024-09-20 eCollection Date: 2024-01-01 DOI:10.1371/journal.pone.0310730

Siqi Yin, Lifan Jiang

{"title":"从多个基础模型中提炼知识，用于零镜头图像分类。","authors":"Siqi Yin, Lifan Jiang","doi":"10.1371/journal.pone.0310730","DOIUrl":null,"url":null,"abstract":"Zero-shot image classification enables the recognition of new categories without requiring additional training data, thereby enhancing the model's generalization capability when specific training are unavailable. This paper introduces a zero-shot image classification framework to recognize new categories that are unseen during training by distilling knowledge from foundation models. Specifically, we first employ ChatGPT and DALL-E to synthesize reference images of unseen categories from text prompts. Then, the test image is aligned with text and reference images using CLIP and DINO to calculate the logits. Finally, the predicted logits are aggregated according to their confidence to produce the final prediction. Experiments are conducted on multiple datasets, including MNIST, SVHN, CIFAR-10, CIFAR-100, and TinyImageNet. The results demonstrate that our method can significantly improve classification accuracy compared to previous approaches, achieving AUROC scores of over 96% across all test datasets. Our code is available at https://github.com/1134112149/MICW-ZIC.","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"19 9","pages":"e0310730"},"PeriodicalIF":2.9000,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11414985/pdf/","citationCount":"0","resultStr":"{\"title\":\"Distilling knowledge from multiple foundation models for zero-shot image classification.\",\"authors\":\"Siqi Yin, Lifan Jiang\",\"doi\":\"10.1371/journal.pone.0310730\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Zero-shot image classification enables the recognition of new categories without requiring additional training data, thereby enhancing the model's generalization capability when specific training are unavailable. This paper introduces a zero-shot image classification framework to recognize new categories that are unseen during training by distilling knowledge from foundation models. Specifically, we first employ ChatGPT and DALL-E to synthesize reference images of unseen categories from text prompts. Then, the test image is aligned with text and reference images using CLIP and DINO to calculate the logits. Finally, the predicted logits are aggregated according to their confidence to produce the final prediction. Experiments are conducted on multiple datasets, including MNIST, SVHN, CIFAR-10, CIFAR-100, and TinyImageNet. The results demonstrate that our method can significantly improve classification accuracy compared to previous approaches, achieving AUROC scores of over 96% across all test datasets. Our code is available at https://github.com/1134112149/MICW-ZIC.\",\"PeriodicalId\":20189,\"journal\":{\"name\":\"PLoS ONE\",\"volume\":\"19 9\",\"pages\":\"e0310730\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11414985/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS ONE\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pone.0310730\",\"RegionNum\":3,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0310730","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

零镜头图像分类无需额外的训练数据就能识别新的类别，从而在缺乏特定训练的情况下增强模型的泛化能力。本文介绍了一种零镜头图像分类框架，通过从基础模型中提炼知识来识别训练过程中未见的新类别。具体来说，我们首先利用 ChatGPT 和 DALL-E 根据文本提示合成未见类别的参考图像。然后，使用 CLIP 和 DINO 将测试图像与文本和参考图像对齐，以计算对数。最后，根据可信度汇总预测对数，得出最终预测结果。我们在多个数据集上进行了实验，包括 MNIST、SVHN、CIFAR-10、CIFAR-100 和 TinyImageNet。结果表明，与以前的方法相比，我们的方法能显著提高分类准确率，在所有测试数据集上的 AUROC 分数都超过了 96%。我们的代码见 https://github.com/1134112149/MICW-ZIC。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Distilling knowledge from multiple foundation models for zero-shot image classification.

Zero-shot image classification enables the recognition of new categories without requiring additional training data, thereby enhancing the model's generalization capability when specific training are unavailable. This paper introduces a zero-shot image classification framework to recognize new categories that are unseen during training by distilling knowledge from foundation models. Specifically, we first employ ChatGPT and DALL-E to synthesize reference images of unseen categories from text prompts. Then, the test image is aligned with text and reference images using CLIP and DINO to calculate the logits. Finally, the predicted logits are aggregated according to their confidence to produce the final prediction. Experiments are conducted on multiple datasets, including MNIST, SVHN, CIFAR-10, CIFAR-100, and TinyImageNet. The results demonstrate that our method can significantly improve classification accuracy compared to previous approaches, achieving AUROC scores of over 96% across all test datasets. Our code is available at https://github.com/1134112149/MICW-ZIC.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

PLoS ONE 生物-生物学

CiteScore

6.20

自引率

5.40%

发文量

14242

审稿时长

3.7 months

期刊介绍： PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides: * Open-access—freely accessible online, authors retain copyright * Fast publication times * Peer review by expert, practicing researchers * Post-publication tools to indicate quality and impact * Community-based dialogue on articles * Worldwide media coverage