Few-shot image generation based on meta-learning and generative adversarial network

IF 3.4 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication Pub Date : 2025-03-28 DOI:10.1016/j.image.2025.117307

Bowen Gu, Junhai Zhai

{"title":"Few-shot image generation based on meta-learning and generative adversarial network","authors":"Bowen Gu, Junhai Zhai","doi":"10.1016/j.image.2025.117307","DOIUrl":null,"url":null,"abstract":"<div><div>Generative adversarial network (GAN) learns the latent distribution of samples through the adversarial training between discriminator and generator, then uses the learned probability distribution to generate realistic samples. Training a vanilla GAN requires a large number of samples and a significant amount of time. However, in practical applications, obtaining a large dataset and dedicating extensive time to model training can be very costly. Training a GAN with a small number of samples to generate high-quality images is a pressing research problem. Although this area has seen limited exploration, FAML (Fast Adaptive Meta-Learning) stands out as a notable approach. However, FAML has the following shortcomings: (1) The training time on complex datasets, such as VGGFaces and MiniImageNet, is excessively long. (2) It exhibits poor generalization performance and produces low-quality images across different datasets. (3) The generated samples lack diversity. To address the three shortcomings, we improved FAML in two key areas: model structure and loss function. The improved model effectively overcomes all three limitations of FAML. We conducted extensive experiments on four datasets to compare our model with the baseline FAML across seven evaluation metrics. The results demonstrate that our model is both more efficient and effective, particularly on the two complex datasets, VGGFaces and MiniImageNet. Our model outperforms FAML on six of the seven evaluation metrics, with only a slight underperformance on one metric. Our code is available at <span><span>https://github.com/BTGWS/FSML-GAN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"137 ","pages":"Article 117307"},"PeriodicalIF":3.4000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal Processing-Image Communication","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0923596525000542","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Generative adversarial network (GAN) learns the latent distribution of samples through the adversarial training between discriminator and generator, then uses the learned probability distribution to generate realistic samples. Training a vanilla GAN requires a large number of samples and a significant amount of time. However, in practical applications, obtaining a large dataset and dedicating extensive time to model training can be very costly. Training a GAN with a small number of samples to generate high-quality images is a pressing research problem. Although this area has seen limited exploration, FAML (Fast Adaptive Meta-Learning) stands out as a notable approach. However, FAML has the following shortcomings: (1) The training time on complex datasets, such as VGGFaces and MiniImageNet, is excessively long. (2) It exhibits poor generalization performance and produces low-quality images across different datasets. (3) The generated samples lack diversity. To address the three shortcomings, we improved FAML in two key areas: model structure and loss function. The improved model effectively overcomes all three limitations of FAML. We conducted extensive experiments on four datasets to compare our model with the baseline FAML across seven evaluation metrics. The results demonstrate that our model is both more efficient and effective, particularly on the two complex datasets, VGGFaces and MiniImageNet. Our model outperforms FAML on six of the seven evaluation metrics, with only a slight underperformance on one metric. Our code is available at https://github.com/BTGWS/FSML-GAN.

查看原文本刊更多论文

基于元学习和生成对抗网络的少镜头图像生成

生成式对抗网络（GAN）通过鉴别器和生成器之间的对抗训练，学习样本的潜在分布，然后利用学习到的概率分布生成真实样本。训练一个香草GAN需要大量的样本和大量的时间。然而，在实际应用中，获取大型数据集并投入大量时间进行模型训练可能是非常昂贵的。用少量样本训练GAN生成高质量图像是一个迫切需要研究的问题。尽管这一领域的探索有限，FAML（快速自适应元学习）作为一种值得注意的方法脱颖而出。但是FAML存在以下缺点：(1)在VGGFaces和MiniImageNet等复杂数据集上的训练时间过长。(2)泛化性能较差，在不同数据集上产生的图像质量较低。(3)生成的样本缺乏多样性。为了解决这三个缺点，我们在两个关键领域改进了FAML：模型结构和损失函数。改进后的模型有效地克服了FAML的这三个局限性。我们在四个数据集上进行了广泛的实验，将我们的模型与七个评估指标的基线FAML进行比较。结果表明，该模型在VGGFaces和MiniImageNet这两个复杂数据集上具有更高的效率和有效性。我们的模型在七个评估指标中的六个上优于FAML，仅在一个指标上略微落后。我们的代码可在https://github.com/BTGWS/FSML-GAN上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Signal Processing-Image Communication 工程技术-工程：电子与电气

CiteScore

8.40

自引率

2.90%

发文量

138

审稿时长

5.2 months

期刊介绍： Signal Processing: Image Communication is an international journal for the development of the theory and practice of image communication. Its primary objectives are the following: To present a forum for the advancement of theory and practice of image communication. To stimulate cross-fertilization between areas similar in nature which have traditionally been separated, for example, various aspects of visual communications and information systems. To contribute to a rapid information exchange between the industrial and academic environments. The editorial policy and the technical content of the journal are the responsibility of the Editor-in-Chief, the Area Editors and the Advisory Editors. The Journal is self-supporting from subscription income and contains a minimum amount of advertisements. Advertisements are subject to the prior approval of the Editor-in-Chief. The journal welcomes contributions from every country in the world. Signal Processing: Image Communication publishes articles relating to aspects of the design, implementation and use of image communication systems. The journal features original research work, tutorial and review articles, and accounts of practical developments. Subjects of interest include image/video coding, 3D video representations and compression, 3D graphics and animation compression, HDTV and 3DTV systems, video adaptation, video over IP, peer-to-peer video networking, interactive visual communication, multi-user video conferencing, wireless video broadcasting and communication, visual surveillance, 2D and 3D image/video quality measures, pre/post processing, video restoration and super-resolution, multi-camera video analysis, motion analysis, content-based image/video indexing and retrieval, face and gesture processing, video synthesis, 2D and 3D image/video acquisition and display technologies, architectures for image/video processing and communication.