Yingchun Guo, Xueqi Lv, Gang Yan, Shu Chen, Shi Di
{"title":"TransStyle: Transformer-based StyleGAN for image inversion and editing","authors":"Yingchun Guo, Xueqi Lv, Gang Yan, Shu Chen, Shi Di","doi":"10.1016/j.patrec.2025.09.002","DOIUrl":null,"url":null,"abstract":"<div><div>Image inversion using StyleGAN retrieves latent codes by embedding real images into the GAN’s latent space, enabling attribute editing and high-quality image generation. However, existing methods often struggle with reconstruction reliability and flexible editing, resulting in low-quality outcomes. To address these issues, we propose TransStyle, a new StyleGAN inversion model based on Transformer technology. Our model features a novel encoder structure, PACP (Path Aggregation with Covariance Pooling), for improved feature representation and a feature prediction head that uses covariance pooling. Additionally, we propose a Transformer-based module to enhance interactions with semantic information in the latent space. StyleGAN then uses this enhanced latent code to generate images with high fidelity and strong editability. Experimental results demonstrate that our method achieves at least 5% higher face reconstruction similarity compared to current state-of-the-art techniques, confirming the advantages of TransStyle in image reconstruction and editing quality.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 1-7"},"PeriodicalIF":3.3000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865525003125","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Image inversion using StyleGAN retrieves latent codes by embedding real images into the GAN’s latent space, enabling attribute editing and high-quality image generation. However, existing methods often struggle with reconstruction reliability and flexible editing, resulting in low-quality outcomes. To address these issues, we propose TransStyle, a new StyleGAN inversion model based on Transformer technology. Our model features a novel encoder structure, PACP (Path Aggregation with Covariance Pooling), for improved feature representation and a feature prediction head that uses covariance pooling. Additionally, we propose a Transformer-based module to enhance interactions with semantic information in the latent space. StyleGAN then uses this enhanced latent code to generate images with high fidelity and strong editability. Experimental results demonstrate that our method achieves at least 5% higher face reconstruction similarity compared to current state-of-the-art techniques, confirming the advantages of TransStyle in image reconstruction and editing quality.
期刊介绍:
Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition.
Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.