LIA：潜像动画器。

IF 18.6

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-08-23 DOI:10.1109/TPAMI.2024.3449075

Yaohui Wang;Di Yang;Francois Bremond;Antitza Dantcheva

{"title":"LIA：潜像动画器。","authors":"Yaohui Wang;Di Yang;Francois Bremond;Antitza Dantcheva","doi":"10.1109/TPAMI.2024.3449075","DOIUrl":null,"url":null,"abstract":"Previous animation techniques mainly focus on leveraging explicit structure representations (\n<italic>e.g.</i>\n, meshes or keypoints) for transferring motion from driving videos to source images. However, such methods are challenged with large appearance variations between source and driving data, as well as require complex additional modules to respectively model appearance and motion. Towards addressing these issues, we introduce the Latent Image Animator (LIA), streamlined to animate high-resolution images. LIA is designed as a simple autoencoder that does not rely on explicit representations. Motion transfer in the pixel space is modeled as linear navigation of motion codes in the latent space. Specifically such navigation is represented as an orthogonal motion dictionary learned in a self-supervised manner based on proposed Linear Motion Decomposition (LMD). Extensive experimental results demonstrate that LIA outperforms state-of-the-art on VoxCeleb, TaichiHD, and TED-talk datasets with respect to video quality and spatio-temporal consistency. In addition LIA is well equipped for zero-shot high-resolution image animation. Code, models, and demo video are available at \n<uri>https://github.com/wyhsirius/LIA</uri>\n.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"10829-10844"},"PeriodicalIF":18.6000,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LIA: Latent Image Animator\",\"authors\":\"Yaohui Wang;Di Yang;Francois Bremond;Antitza Dantcheva\",\"doi\":\"10.1109/TPAMI.2024.3449075\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Previous animation techniques mainly focus on leveraging explicit structure representations (\\n<italic>e.g.</i>\\n, meshes or keypoints) for transferring motion from driving videos to source images. However, such methods are challenged with large appearance variations between source and driving data, as well as require complex additional modules to respectively model appearance and motion. Towards addressing these issues, we introduce the Latent Image Animator (LIA), streamlined to animate high-resolution images. LIA is designed as a simple autoencoder that does not rely on explicit representations. Motion transfer in the pixel space is modeled as linear navigation of motion codes in the latent space. Specifically such navigation is represented as an orthogonal motion dictionary learned in a self-supervised manner based on proposed Linear Motion Decomposition (LMD). Extensive experimental results demonstrate that LIA outperforms state-of-the-art on VoxCeleb, TaichiHD, and TED-talk datasets with respect to video quality and spatio-temporal consistency. In addition LIA is well equipped for zero-shot high-resolution image animation. Code, models, and demo video are available at \\n<uri>https://github.com/wyhsirius/LIA</uri>\\n.\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":\"46 12\",\"pages\":\"10829-10844\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2024-08-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10645735/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10645735/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

以往的动画技术主要侧重于利用明确的结构表示（如网格或关键点）将运动从驾驶视频转移到源图像。然而，这些方法在源数据和驾驶数据之间存在较大外观差异的情况下面临挑战，并且需要复杂的附加模块来分别对外观和运动进行建模。为了解决这些问题，我们引入了潜像动画器（LIA），该动画器可简化高分辨率图像的动画制作。LIA 设计为一个简单的自动编码器，不依赖于明确的表征。像素空间中的运动传输被模拟为潜空间中运动代码的线性导航。具体来说，这种导航是以基于线性运动分解（LMD）的自监督方式学习的正交运动字典来表示的。广泛的实验结果表明，在 VoxCeleb、TaichiHD 和 TED-talk 数据集上，LIA 在视频质量和时空一致性方面优于最先进的技术。此外，LIA 还可用于零镜头高分辨率图像动画。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

LIA: Latent Image Animator

Previous animation techniques mainly focus on leveraging explicit structure representations ( e.g. , meshes or keypoints) for transferring motion from driving videos to source images. However, such methods are challenged with large appearance variations between source and driving data, as well as require complex additional modules to respectively model appearance and motion. Towards addressing these issues, we introduce the Latent Image Animator (LIA), streamlined to animate high-resolution images. LIA is designed as a simple autoencoder that does not rely on explicit representations. Motion transfer in the pixel space is modeled as linear navigation of motion codes in the latent space. Specifically such navigation is represented as an orthogonal motion dictionary learned in a self-supervised manner based on proposed Linear Motion Decomposition (LMD). Extensive experimental results demonstrate that LIA outperforms state-of-the-art on VoxCeleb, TaichiHD, and TED-talk datasets with respect to video quality and spatio-temporal consistency. In addition LIA is well equipped for zero-shot high-resolution image animation. Code, models, and demo video are available at https://github.com/wyhsirius/LIA .

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量