ClipSwap++: Improved Identity and Attributes Aware Face Swapping

IF 5

IEEE transactions on biometrics, behavior, and identity science Pub Date : 2025-06-03 DOI:10.1109/TBIOM.2025.3576111

Phyo Thet Yee;Sudeepta Mishra;Abhinav Dhall

{"title":"ClipSwap++: Improved Identity and Attributes Aware Face Swapping","authors":"Phyo Thet Yee;Sudeepta Mishra;Abhinav Dhall","doi":"10.1109/TBIOM.2025.3576111","DOIUrl":null,"url":null,"abstract":"This paper introduces an efficient framework for an identity and attributes aware face swapping. Accurately preserving the source face’s identity while maintaining the target face’s attributes remains a challenge in face swapping due to mismatches between identity and attribute features. To address this, based on our previous work, ClipSwap, we propose an extended version, ClipSwap++, with improved model efficiency with respect to inference time, memory consumption, and more accurate preservation of identity and attributes. Our model is mainly composed of a conditional Generative Adversarial Network and a CLIP-based image encoder to generate realistic face-swapped images. We carefully design our ClipSwap++ with the combination of following three components. First, we introduce the Adaptive Identity Fusion Module (AIFM), which ensures accurate preservation of identity through the careful integration of ArcFace-encoded identity with CLIP-embedded identity. Second, we propose a new decoder architecture with multiple Multi-level Attributes Integration Module (MAIM) to adaptively integrate identity and attribute features, enhancing the preservation of source face’s identity while maintaining the target image’s important attributes. Third, to enhance further the attribute preservation, we introduce Multi-level Attributes Preservation Loss, which calculates the distance between the intermediate and the final output features of the target and swapped images. We perform quantitative and qualitative evaluations using three datasets, and our model obtains the highest identity accuracy (98.93%) with low pose error (1.62) on FaceForensics++ dataset and less inference time (0.30 sec).","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"7 4","pages":"862-875"},"PeriodicalIF":5.0000,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on biometrics, behavior, and identity science","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11022728/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper introduces an efficient framework for an identity and attributes aware face swapping. Accurately preserving the source face’s identity while maintaining the target face’s attributes remains a challenge in face swapping due to mismatches between identity and attribute features. To address this, based on our previous work, ClipSwap, we propose an extended version, ClipSwap++, with improved model efficiency with respect to inference time, memory consumption, and more accurate preservation of identity and attributes. Our model is mainly composed of a conditional Generative Adversarial Network and a CLIP-based image encoder to generate realistic face-swapped images. We carefully design our ClipSwap++ with the combination of following three components. First, we introduce the Adaptive Identity Fusion Module (AIFM), which ensures accurate preservation of identity through the careful integration of ArcFace-encoded identity with CLIP-embedded identity. Second, we propose a new decoder architecture with multiple Multi-level Attributes Integration Module (MAIM) to adaptively integrate identity and attribute features, enhancing the preservation of source face’s identity while maintaining the target image’s important attributes. Third, to enhance further the attribute preservation, we introduce Multi-level Attributes Preservation Loss, which calculates the distance between the intermediate and the final output features of the target and swapped images. We perform quantitative and qualitative evaluations using three datasets, and our model obtains the highest identity accuracy (98.93%) with low pose error (1.62) on FaceForensics++ dataset and less inference time (0.30 sec).

查看原文本刊更多论文

ClipSwap++：改进的身份和属性感知的人脸交换

介绍了一种有效的身份和属性感知人脸交换框架。在人脸交换中，由于身份特征与属性特征不匹配，在保持目标人脸属性的同时准确地保持源人脸的身份特征是一个难题。为了解决这个问题，基于我们之前的工作ClipSwap，我们提出了一个扩展版本ClipSwap++，它在推理时间、内存消耗和更准确地保存身份和属性方面提高了模型效率。我们的模型主要由条件生成对抗网络和基于clip的图像编码器组成，以生成逼真的面部交换图像。我们精心设计了ClipSwap++，结合了以下三个组件。首先，我们介绍了自适应身份融合模块（AIFM），该模块通过精心整合arcface编码的身份和clip嵌入的身份来确保身份的准确保存。其次，提出了一种基于多级属性集成模块（MAIM）的解码器结构，实现了身份特征和属性特征的自适应集成，在保持目标图像重要属性的同时增强了源人脸身份的保留；第三，为了进一步增强属性保存，我们引入了多级属性保存损失，它计算目标图像和交换图像的中间和最终输出特征之间的距离。我们使用三个数据集进行了定量和定性评估，我们的模型在facefrensics ++数据集上获得了最高的识别准确率（98.93%），姿态误差（1.62）低，推理时间（0.30秒）少。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on biometrics, behavior, and identity science

CiteScore

10.90

自引率

0.00%

发文量