Real-time data-efficient portrait stylization via geometric alignment.

IF 6.3 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2025-11-01 Epub Date: 2025-06-25 DOI:10.1016/j.neunet.2025.107774

Xinrui Wang, Zhuoru Li, Xuanyu Yin, Xiao Zhou, Yusuke Iwasawa, Yutaka Matsuo, Jiaxian Guo

{"title":"Real-time data-efficient portrait stylization via geometric alignment.","authors":"Xinrui Wang, Zhuoru Li, Xuanyu Yin, Xiao Zhou, Yusuke Iwasawa, Yutaka Matsuo, Jiaxian Guo","doi":"10.1016/j.neunet.2025.107774","DOIUrl":null,"url":null,"abstract":"<p><p>Portrait Stylization aims to imbue portrait photos with vivid artistic effects drawn from style examples. Despite the availability of enormous training datasets and large network weights, existing methods struggle to maintain geometric consistency and achieve satisfactory stylization effects due to the disparity in facial feature distributions between facial photographs and stylized images, limiting the application on rare styles and mobile devices. To alleviate this, we propose to establish meaningful geometric correlations between portraits and style samples to simplify the stylization by aligning corresponding facial characteristics. Specifically, we integrate differentiable Thin-Plate-Spline (TPS) modules into an end-to-end Generative Adversarial Network (GAN) framework to improve the training efficiency and promote the consistency of facial identities. By leveraging inherent structural information of faces, e.g., facial landmarks, TPS module can establish geometric alignments between the two domains, at global and local scales, both in pixel and feature spaces, thereby overcoming the aforementioned challenges. Quantitative and qualitative comparisons on a range of portrait stylization tasks demonstrate that our models not only outperforms existing models in terms of fidelity and stylistic consistency, but also achieves remarkable improvements in 2× training data efficiency and 100× less computational complexity, allowing our lightweight model to achieve real-time inference (30 FPS) at 512*512 resolution on mobile devices.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"191 ","pages":"107774"},"PeriodicalIF":6.3000,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1016/j.neunet.2025.107774","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/25 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Portrait Stylization aims to imbue portrait photos with vivid artistic effects drawn from style examples. Despite the availability of enormous training datasets and large network weights, existing methods struggle to maintain geometric consistency and achieve satisfactory stylization effects due to the disparity in facial feature distributions between facial photographs and stylized images, limiting the application on rare styles and mobile devices. To alleviate this, we propose to establish meaningful geometric correlations between portraits and style samples to simplify the stylization by aligning corresponding facial characteristics. Specifically, we integrate differentiable Thin-Plate-Spline (TPS) modules into an end-to-end Generative Adversarial Network (GAN) framework to improve the training efficiency and promote the consistency of facial identities. By leveraging inherent structural information of faces, e.g., facial landmarks, TPS module can establish geometric alignments between the two domains, at global and local scales, both in pixel and feature spaces, thereby overcoming the aforementioned challenges. Quantitative and qualitative comparisons on a range of portrait stylization tasks demonstrate that our models not only outperforms existing models in terms of fidelity and stylistic consistency, but also achieves remarkable improvements in 2× training data efficiency and 100× less computational complexity, allowing our lightweight model to achieve real-time inference (30 FPS) at 512*512 resolution on mobile devices.

查看原文本刊更多论文

通过几何对齐实现实时数据高效的肖像样式化。

肖像风格化旨在为肖像照片注入生动的艺术效果，从风格例子中汲取灵感。尽管有大量的训练数据集和大的网络权值，但由于面部照片和风格化图像之间的面部特征分布存在差异，现有方法难以保持几何一致性并获得满意的风格化效果，限制了在稀有风格和移动设备上的应用。为了缓解这一问题，我们建议在肖像和风格样本之间建立有意义的几何相关性，通过对齐相应的面部特征来简化风格化。具体而言，我们将可微薄板样条（TPS）模块集成到端到端生成对抗网络（GAN）框架中，以提高训练效率并促进面部身份的一致性。通过利用人脸固有的结构信息，如面部地标，TPS模块可以在全局和局部尺度上，在像素和特征空间中建立两个域之间的几何对齐，从而克服上述挑战。在一系列肖像风格化任务上的定量和定性比较表明，我们的模型不仅在保真度和风格一致性方面优于现有模型，而且在2倍的训练数据效率和100倍的计算复杂度方面取得了显着改善，使我们的轻量级模型能够在移动设备上以512*512分辨率实现实时推理（30 FPS）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.