LIA: Latent Image Animator

Yaohui Wang;Di Yang;Francois Bremond;Antitza Dantcheva
{"title":"LIA: Latent Image Animator","authors":"Yaohui Wang;Di Yang;Francois Bremond;Antitza Dantcheva","doi":"10.1109/TPAMI.2024.3449075","DOIUrl":null,"url":null,"abstract":"Previous animation techniques mainly focus on leveraging explicit structure representations (\n<italic>e.g.</i>\n, meshes or keypoints) for transferring motion from driving videos to source images. However, such methods are challenged with large appearance variations between source and driving data, as well as require complex additional modules to respectively model appearance and motion. Towards addressing these issues, we introduce the Latent Image Animator (LIA), streamlined to animate high-resolution images. LIA is designed as a simple autoencoder that does not rely on explicit representations. Motion transfer in the pixel space is modeled as linear navigation of motion codes in the latent space. Specifically such navigation is represented as an orthogonal motion dictionary learned in a self-supervised manner based on proposed Linear Motion Decomposition (LMD). Extensive experimental results demonstrate that LIA outperforms state-of-the-art on VoxCeleb, TaichiHD, and TED-talk datasets with respect to video quality and spatio-temporal consistency. In addition LIA is well equipped for zero-shot high-resolution image animation. Code, models, and demo video are available at \n<uri>https://github.com/wyhsirius/LIA</uri>\n.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"10829-10844"},"PeriodicalIF":0.0000,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10645735/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Previous animation techniques mainly focus on leveraging explicit structure representations ( e.g. , meshes or keypoints) for transferring motion from driving videos to source images. However, such methods are challenged with large appearance variations between source and driving data, as well as require complex additional modules to respectively model appearance and motion. Towards addressing these issues, we introduce the Latent Image Animator (LIA), streamlined to animate high-resolution images. LIA is designed as a simple autoencoder that does not rely on explicit representations. Motion transfer in the pixel space is modeled as linear navigation of motion codes in the latent space. Specifically such navigation is represented as an orthogonal motion dictionary learned in a self-supervised manner based on proposed Linear Motion Decomposition (LMD). Extensive experimental results demonstrate that LIA outperforms state-of-the-art on VoxCeleb, TaichiHD, and TED-talk datasets with respect to video quality and spatio-temporal consistency. In addition LIA is well equipped for zero-shot high-resolution image animation. Code, models, and demo video are available at https://github.com/wyhsirius/LIA .
LIA:潜像动画器。
以往的动画技术主要侧重于利用明确的结构表示(如网格或关键点)将运动从驾驶视频转移到源图像。然而,这些方法在源数据和驾驶数据之间存在较大外观差异的情况下面临挑战,并且需要复杂的附加模块来分别对外观和运动进行建模。为了解决这些问题,我们引入了潜像动画器(LIA),该动画器可简化高分辨率图像的动画制作。LIA 设计为一个简单的自动编码器,不依赖于明确的表征。像素空间中的运动传输被模拟为潜空间中运动代码的线性导航。具体来说,这种导航是以基于线性运动分解(LMD)的自监督方式学习的正交运动字典来表示的。广泛的实验结果表明,在 VoxCeleb、TaichiHD 和 TED-talk 数据集上,LIA 在视频质量和时空一致性方面优于最先进的技术。此外,LIA 还可用于零镜头高分辨率图像动画。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信