Local part attention for image stylization with text prompt

Neural Computing and Applications Pub Date : 2024-09-17 DOI:10.1007/s00521-024-10394-w

Quoc-Truong Truong, Vinh-Tiep Nguyen, Lan-Phuong Nguyen, Hung-Phu Cao, Duc-Tuan Luu

{"title":"Local part attention for image stylization with text prompt","authors":"Quoc-Truong Truong, Vinh-Tiep Nguyen, Lan-Phuong Nguyen, Hung-Phu Cao, Duc-Tuan Luu","doi":"10.1007/s00521-024-10394-w","DOIUrl":null,"url":null,"abstract":"<p>Prompt-based portrait image style transfer aims at translating an input content image to a desired style described by text without a style image. In many practical situations, users may not only attend to the entire portrait image but also the local parts (e.g., eyes, lips, and hair). To address such applications, we propose a new framework that enables style transfer on specific regions described by a text description of the desired style. Specifically, we incorporate semantic segmentation to identify the intended area without requiring edit masks from the user while utilizing a pre-trained CLIP-based model for stylizing. Besides, we propose a text-to-patch matching loss by randomly dividing the stylized image into smaller patches to ensure the consistent quality of the result. To comprehensively evaluate the proposed method, we use several metrics, such as FID, SSIM, and PSNR on a dataset consisting of portraits from the CelebAMask-HQ dataset and style descriptions of other related works. Extensive experimental results demonstrate that our framework outperforms other state-of-the-art methods in terms of both stylization quality and inference time.</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":"15 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computing and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00521-024-10394-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Prompt-based portrait image style transfer aims at translating an input content image to a desired style described by text without a style image. In many practical situations, users may not only attend to the entire portrait image but also the local parts (e.g., eyes, lips, and hair). To address such applications, we propose a new framework that enables style transfer on specific regions described by a text description of the desired style. Specifically, we incorporate semantic segmentation to identify the intended area without requiring edit masks from the user while utilizing a pre-trained CLIP-based model for stylizing. Besides, we propose a text-to-patch matching loss by randomly dividing the stylized image into smaller patches to ensure the consistent quality of the result. To comprehensively evaluate the proposed method, we use several metrics, such as FID, SSIM, and PSNR on a dataset consisting of portraits from the CelebAMask-HQ dataset and style descriptions of other related works. Extensive experimental results demonstrate that our framework outperforms other state-of-the-art methods in terms of both stylization quality and inference time.

Abstract Image

查看原文本刊更多论文

通过文本提示实现图像风格化的局部关注

基于提示的肖像图像风格转换旨在将输入的内容图像转换为由文字描述的所需风格，而无需风格图像。在许多实际情况下，用户可能不仅关注整个肖像图像，还关注局部（如眼睛、嘴唇和头发）。针对此类应用，我们提出了一种新的框架，可在由所需风格的文字描述所描述的特定区域进行风格转移。具体来说，我们结合了语义分割技术来识别目标区域，而不需要用户提供编辑掩码，同时利用预先训练好的基于 CLIP 的模型来进行风格化。此外，我们还提出了一种文本到补丁的匹配损失方法，即随机将风格化图像分割成更小的补丁，以确保结果质量的一致性。为了全面评估所提出的方法，我们在由 CelebAMask-HQ 数据集和其他相关作品的风格描述组成的数据集上使用了 FID、SSIM 和 PSNR 等多个指标。广泛的实验结果表明，我们的框架在风格化质量和推理时间方面都优于其他最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Computing and Applications

自引率

0.00%

发文量