风格相遇：使用扩散模型的文本驱动的实时视频风格化。

IF 1.4 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Computer Graphics and Applications Pub Date : 2025-03-01 DOI:10.1109/MCG.2025.3554312

David Kunz, Ondrej Texler, David Mould, Daniel Sykora

{"title":"风格相遇：使用扩散模型的文本驱动的实时视频风格化。","authors":"David Kunz, Ondrej Texler, David Mould, Daniel Sykora","doi":"10.1109/MCG.2025.3554312","DOIUrl":null,"url":null,"abstract":"We present Meet-in-Style-a new approach to real-time stylization of live video streams using text prompts. In contrast to previous text-based techniques, our system is able to stylize input video at 30 fps on commodity graphics hardware while preserving structural consistency of the stylized sequence and minimizing temporal flicker. A key idea of our approach is to combine diffusion-based image stylization with a few-shot patch-based training strategy that can produce a custom image-to-image stylization network with real-time inference capabilities. Such a combination not only allows for fast stylization, but also greatly improves consistency of individual stylized frames compared to a scenario where diffusion is applied to each video frame separately. We conducted a number of user experiments in which we found our approach to be particularly useful in video conference scenarios enabling participants to interactively apply different visual styles to themselves (or to each other) to enhance the overall chatting experience.","PeriodicalId":55026,"journal":{"name":"IEEE Computer Graphics and Applications","volume":"PP ","pages":"47-56"},"PeriodicalIF":1.4000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Meet-in-Style: Text-Driven Real-Time Video Stylization Using Diffusion Models.\",\"authors\":\"David Kunz, Ondrej Texler, David Mould, Daniel Sykora\",\"doi\":\"10.1109/MCG.2025.3554312\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present Meet-in-Style-a new approach to real-time stylization of live video streams using text prompts. In contrast to previous text-based techniques, our system is able to stylize input video at 30 fps on commodity graphics hardware while preserving structural consistency of the stylized sequence and minimizing temporal flicker. A key idea of our approach is to combine diffusion-based image stylization with a few-shot patch-based training strategy that can produce a custom image-to-image stylization network with real-time inference capabilities. Such a combination not only allows for fast stylization, but also greatly improves consistency of individual stylized frames compared to a scenario where diffusion is applied to each video frame separately. We conducted a number of user experiments in which we found our approach to be particularly useful in video conference scenarios enabling participants to interactively apply different visual styles to themselves (or to each other) to enhance the overall chatting experience.\",\"PeriodicalId\":55026,\"journal\":{\"name\":\"IEEE Computer Graphics and Applications\",\"volume\":\"PP \",\"pages\":\"47-56\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Computer Graphics and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/MCG.2025.3554312\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Computer Graphics and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/MCG.2025.3554312","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

我们提出meet - in - style -一种使用文本提示的实时视频流样式化的新方法。与之前基于文本的技术相比，我们的系统能够在商用图形硬件上以30 fps的速度对输入视频进行风格化，同时保持风格化序列的结构一致性并最大限度地减少时间闪烁。我们方法的一个关键思想是将基于扩散的图像风格化与基于少量镜头补丁的训练策略相结合，该策略可以产生具有实时推理能力的自定义图像到图像风格化网络。这样的组合不仅允许快速风格化，而且与单独应用扩散到每个视频帧的场景相比，还大大提高了单个风格化帧的一致性。我们进行了大量的用户实验，发现我们的方法在视频会议场景中特别有用，使参与者能够交互式地为自己（或彼此）应用不同的视觉风格，以增强整体聊天体验。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Meet-in-Style: Text-Driven Real-Time Video Stylization Using Diffusion Models.

We present Meet-in-Style-a new approach to real-time stylization of live video streams using text prompts. In contrast to previous text-based techniques, our system is able to stylize input video at 30 fps on commodity graphics hardware while preserving structural consistency of the stylized sequence and minimizing temporal flicker. A key idea of our approach is to combine diffusion-based image stylization with a few-shot patch-based training strategy that can produce a custom image-to-image stylization network with real-time inference capabilities. Such a combination not only allows for fast stylization, but also greatly improves consistency of individual stylized frames compared to a scenario where diffusion is applied to each video frame separately. We conducted a number of user experiments in which we found our approach to be particularly useful in video conference scenarios enabling participants to interactively apply different visual styles to themselves (or to each other) to enhance the overall chatting experience.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Computer Graphics and Applications 工程技术-计算机：软件工程

CiteScore

3.20

自引率

5.60%

发文量

160

审稿时长

>12 weeks

期刊介绍： IEEE Computer Graphics and Applications (CG&A) bridges the theory and practice of computer graphics, visualization, virtual and augmented reality, and HCI. From specific algorithms to full system implementations, CG&A offers a unique combination of peer-reviewed feature articles and informal departments. Theme issues guest edited by leading researchers in their fields track the latest developments and trends in computer-generated graphical content, while tutorials and surveys provide a broad overview of interesting and timely topics. Regular departments further explore the core areas of graphics as well as extend into topics such as usability, education, history, and opinion. Each issue, the story of our cover focuses on creative applications of the technology by an artist or designer. Published six times a year, CG&A is indispensable reading for people working at the leading edge of computer-generated graphics technology and its applications in everything from business to the arts.