适应任何：使用文本到图像扩散模型跨域和类别定制任何图像分类器

IF 5.7 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data Pub Date : 2025-01-30 DOI:10.1109/TBDATA.2025.3536933

Weijie Chen;Haoyu Wang;Shicai Yang;Lei Zhang;Wei Wei;Yanning Zhang;Luojun Lin;Di Xie;Yueting Zhuang

{"title":"适应任何：使用文本到图像扩散模型跨域和类别定制任何图像分类器","authors":"Weijie Chen;Haoyu Wang;Shicai Yang;Lei Zhang;Wei Wei;Yanning Zhang;Luojun Lin;Di Xie;Yueting Zhuang","doi":"10.1109/TBDATA.2025.3536933","DOIUrl":null,"url":null,"abstract":"We study a novel problem in this paper, that is, if a modern text-to-image diffusion model can tailor any image classifier across domains and categories. Existing domain adaption works exploit both source and target data for domain alignment so as to transfer the knowledge from the labeled source data to the unlabeled target data. However, as the development of text-to-image diffusion models, we wonder if the high-fidelity synthetic data can serve as a surrogate of the source data in real world. In this way, we do not need to collect and annotate the source data for each image classification task in a one-for-one manner. Instead, we utilize only one off-the-shelf text-to-image model to synthesize images with labels derived from text prompts, and then leverage them as a bridge to dig out the knowledge from the task-agnostic text-to-image generator to the task-oriented image classifier via domain adaptation. Such a one-for-all adaptation paradigm allows us to adapt anything in the world using only one text-to-image generator as well as any unlabeled target data. Extensive experiments validate the feasibility of this idea, which even surprisingly surpasses the state-of-the-art domain adaptation works using the source data collected and annotated in real world.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"1013-1026"},"PeriodicalIF":5.7000,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adapt Anything: Tailor Any Image Classifier Across Domains and Categories Using Text-to-Image Diffusion Models\",\"authors\":\"Weijie Chen;Haoyu Wang;Shicai Yang;Lei Zhang;Wei Wei;Yanning Zhang;Luojun Lin;Di Xie;Yueting Zhuang\",\"doi\":\"10.1109/TBDATA.2025.3536933\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study a novel problem in this paper, that is, if a modern text-to-image diffusion model can tailor any image classifier across domains and categories. Existing domain adaption works exploit both source and target data for domain alignment so as to transfer the knowledge from the labeled source data to the unlabeled target data. However, as the development of text-to-image diffusion models, we wonder if the high-fidelity synthetic data can serve as a surrogate of the source data in real world. In this way, we do not need to collect and annotate the source data for each image classification task in a one-for-one manner. Instead, we utilize only one off-the-shelf text-to-image model to synthesize images with labels derived from text prompts, and then leverage them as a bridge to dig out the knowledge from the task-agnostic text-to-image generator to the task-oriented image classifier via domain adaptation. Such a one-for-all adaptation paradigm allows us to adapt anything in the world using only one text-to-image generator as well as any unlabeled target data. Extensive experiments validate the feasibility of this idea, which even surprisingly surpasses the state-of-the-art domain adaptation works using the source data collected and annotated in real world.\",\"PeriodicalId\":13106,\"journal\":{\"name\":\"IEEE Transactions on Big Data\",\"volume\":\"11 3\",\"pages\":\"1013-1026\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2025-01-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Big Data\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10858456/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10858456/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

本文研究了一个新的问题，即现代文本到图像扩散模型是否可以跨领域和类别定制任何图像分类器。现有的领域自适应工作利用源数据和目标数据进行领域对齐，从而将知识从标记的源数据传递到未标记的目标数据。然而，随着文本到图像扩散模型的发展，我们想知道高保真合成数据是否可以在现实世界中作为源数据的替代品。这样，我们就不需要对每个图像分类任务的源数据进行一对一的收集和标注。相反，我们只使用一个现成的文本到图像模型来合成带有文本提示的标签的图像，然后利用它们作为桥梁，通过域适应从任务不可知的文本到图像生成器到面向任务的图像分类器中挖掘知识。这种“一刀切”的适应范式允许我们仅使用一个文本到图像生成器以及任何未标记的目标数据来适应世界上的任何东西。大量的实验验证了这一想法的可行性，甚至令人惊讶地超过了使用在现实世界中收集和注释的源数据进行的最先进的领域自适应工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Adapt Anything: Tailor Any Image Classifier Across Domains and Categories Using Text-to-Image Diffusion Models

We study a novel problem in this paper, that is, if a modern text-to-image diffusion model can tailor any image classifier across domains and categories. Existing domain adaption works exploit both source and target data for domain alignment so as to transfer the knowledge from the labeled source data to the unlabeled target data. However, as the development of text-to-image diffusion models, we wonder if the high-fidelity synthetic data can serve as a surrogate of the source data in real world. In this way, we do not need to collect and annotate the source data for each image classification task in a one-for-one manner. Instead, we utilize only one off-the-shelf text-to-image model to synthesize images with labels derived from text prompts, and then leverage them as a bridge to dig out the knowledge from the task-agnostic text-to-image generator to the task-oriented image classifier via domain adaptation. Such a one-for-all adaptation paradigm allows us to adapt anything in the world using only one text-to-image generator as well as any unlabeled target data. Extensive experiments validate the feasibility of this idea, which even surprisingly surpasses the state-of-the-art domain adaptation works using the source data collected and annotated in real world.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Big Data Multiple-

CiteScore

11.80

自引率

2.80%

发文量

114

期刊介绍： The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.