{"title":"Local part attention for image stylization with text prompt","authors":"Quoc-Truong Truong, Vinh-Tiep Nguyen, Lan-Phuong Nguyen, Hung-Phu Cao, Duc-Tuan Luu","doi":"10.1007/s00521-024-10394-w","DOIUrl":null,"url":null,"abstract":"<p>Prompt-based portrait image style transfer aims at translating an input content image to a desired style described by text without a style image. In many practical situations, users may not only attend to the entire portrait image but also the local parts (e.g., eyes, lips, and hair). To address such applications, we propose a new framework that enables style transfer on specific regions described by a text description of the desired style. Specifically, we incorporate semantic segmentation to identify the intended area without requiring edit masks from the user while utilizing a pre-trained CLIP-based model for stylizing. Besides, we propose a text-to-patch matching loss by randomly dividing the stylized image into smaller patches to ensure the consistent quality of the result. To comprehensively evaluate the proposed method, we use several metrics, such as FID, SSIM, and PSNR on a dataset consisting of portraits from the CelebAMask-HQ dataset and style descriptions of other related works. Extensive experimental results demonstrate that our framework outperforms other state-of-the-art methods in terms of both stylization quality and inference time.</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computing and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00521-024-10394-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Prompt-based portrait image style transfer aims at translating an input content image to a desired style described by text without a style image. In many practical situations, users may not only attend to the entire portrait image but also the local parts (e.g., eyes, lips, and hair). To address such applications, we propose a new framework that enables style transfer on specific regions described by a text description of the desired style. Specifically, we incorporate semantic segmentation to identify the intended area without requiring edit masks from the user while utilizing a pre-trained CLIP-based model for stylizing. Besides, we propose a text-to-patch matching loss by randomly dividing the stylized image into smaller patches to ensure the consistent quality of the result. To comprehensively evaluate the proposed method, we use several metrics, such as FID, SSIM, and PSNR on a dataset consisting of portraits from the CelebAMask-HQ dataset and style descriptions of other related works. Extensive experimental results demonstrate that our framework outperforms other state-of-the-art methods in terms of both stylization quality and inference time.