{"title":"基于语义掩码重构和类别语义学习的少镜头图像生成。","authors":"Ting Xiao, Yunjie Cai, Jiaoyan Guan, Zhe Wang","doi":"10.1016/j.neunet.2024.106946","DOIUrl":null,"url":null,"abstract":"<p><p>Few-shot image generation aims at generating novel images for the unseen category when given K images from the same category. Despite significant advancements in existing few-shot image generation methods, great challenges remain regarding the quality and diversity of the generated images. This issue stems from the model's struggle to fully comprehend the semantic content of images and extract sufficiently semantic representations. To address these issues, we propose a semantic mask reconstruction (SMR) and category semantic learning (CSL) method for few-shot image generation. Specifically, SMR performs mask reconstruction in a high-level semantic space and designs a strategy for dynamically adjusting the mask ratio, which increases the difficulty of the generation tasks by gradually increasing the mask ratio to enhance the learning ability of the discriminator, thereby prompting the generator to learn more critical features relevant to the generation task. In addition, CSL introduces a triplet loss to optimize the distance between the generated image, its corresponding input image, and input images of other categories. This encourages the generative model to discern subtle differences between categories, thereby achieving more fine-grained generation and improving the fidelity of generated images. Both SMR and CSL can function as plug-and-play modules. Extensive experimental results across three standard datasets demonstrate that the SMR-CSL outperforms other methods in terms of the quality and diversity of the generated images. Furthermore, the results of downstream classification experiments verify that the images generated by the proposed method can effectively assist downstream classification tasks.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"183 ","pages":"106946"},"PeriodicalIF":6.0000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semantic Mask Reconstruction and Category Semantic Learning for few-shot image generation.\",\"authors\":\"Ting Xiao, Yunjie Cai, Jiaoyan Guan, Zhe Wang\",\"doi\":\"10.1016/j.neunet.2024.106946\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Few-shot image generation aims at generating novel images for the unseen category when given K images from the same category. Despite significant advancements in existing few-shot image generation methods, great challenges remain regarding the quality and diversity of the generated images. This issue stems from the model's struggle to fully comprehend the semantic content of images and extract sufficiently semantic representations. To address these issues, we propose a semantic mask reconstruction (SMR) and category semantic learning (CSL) method for few-shot image generation. Specifically, SMR performs mask reconstruction in a high-level semantic space and designs a strategy for dynamically adjusting the mask ratio, which increases the difficulty of the generation tasks by gradually increasing the mask ratio to enhance the learning ability of the discriminator, thereby prompting the generator to learn more critical features relevant to the generation task. In addition, CSL introduces a triplet loss to optimize the distance between the generated image, its corresponding input image, and input images of other categories. This encourages the generative model to discern subtle differences between categories, thereby achieving more fine-grained generation and improving the fidelity of generated images. Both SMR and CSL can function as plug-and-play modules. Extensive experimental results across three standard datasets demonstrate that the SMR-CSL outperforms other methods in terms of the quality and diversity of the generated images. Furthermore, the results of downstream classification experiments verify that the images generated by the proposed method can effectively assist downstream classification tasks.</p>\",\"PeriodicalId\":49763,\"journal\":{\"name\":\"Neural Networks\",\"volume\":\"183 \",\"pages\":\"106946\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1016/j.neunet.2024.106946\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/12/3 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1016/j.neunet.2024.106946","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/3 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Semantic Mask Reconstruction and Category Semantic Learning for few-shot image generation.
Few-shot image generation aims at generating novel images for the unseen category when given K images from the same category. Despite significant advancements in existing few-shot image generation methods, great challenges remain regarding the quality and diversity of the generated images. This issue stems from the model's struggle to fully comprehend the semantic content of images and extract sufficiently semantic representations. To address these issues, we propose a semantic mask reconstruction (SMR) and category semantic learning (CSL) method for few-shot image generation. Specifically, SMR performs mask reconstruction in a high-level semantic space and designs a strategy for dynamically adjusting the mask ratio, which increases the difficulty of the generation tasks by gradually increasing the mask ratio to enhance the learning ability of the discriminator, thereby prompting the generator to learn more critical features relevant to the generation task. In addition, CSL introduces a triplet loss to optimize the distance between the generated image, its corresponding input image, and input images of other categories. This encourages the generative model to discern subtle differences between categories, thereby achieving more fine-grained generation and improving the fidelity of generated images. Both SMR and CSL can function as plug-and-play modules. Extensive experimental results across three standard datasets demonstrate that the SMR-CSL outperforms other methods in terms of the quality and diversity of the generated images. Furthermore, the results of downstream classification experiments verify that the images generated by the proposed method can effectively assist downstream classification tasks.
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.