David Kunz, Ondrej Texler, David Mould, Daniel Sykora
{"title":"Meet-in-Style: Text-Driven Real-Time Video Stylization Using Diffusion Models.","authors":"David Kunz, Ondrej Texler, David Mould, Daniel Sykora","doi":"10.1109/MCG.2025.3554312","DOIUrl":null,"url":null,"abstract":"<p><p>We present Meet-in-Style-a new approach to real-time stylization of live video streams using text prompts. In contrast to previous text-based techniques, our system is able to stylize input video at 30 fps on commodity graphics hardware while preserving structural consistency of the stylized sequence and minimizing temporal flicker. A key idea of our approach is to combine diffusion-based image stylization with a few-shot patch-based training strategy that can produce a custom image-to-image stylization network with real-time inference capabilities. Such a combination not only allows for fast stylization, but also greatly improves consistency of individual stylized frames compared to a scenario where diffusion is applied to each video frame separately. We conducted a number of user experiments in which we found our approach to be particularly useful in video conference scenarios enabling participants to interactively apply different visual styles to themselves (or to each other) to enhance the overall chatting experience.</p>","PeriodicalId":55026,"journal":{"name":"IEEE Computer Graphics and Applications","volume":"PP ","pages":"47-56"},"PeriodicalIF":1.7000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Computer Graphics and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/MCG.2025.3554312","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
We present Meet-in-Style-a new approach to real-time stylization of live video streams using text prompts. In contrast to previous text-based techniques, our system is able to stylize input video at 30 fps on commodity graphics hardware while preserving structural consistency of the stylized sequence and minimizing temporal flicker. A key idea of our approach is to combine diffusion-based image stylization with a few-shot patch-based training strategy that can produce a custom image-to-image stylization network with real-time inference capabilities. Such a combination not only allows for fast stylization, but also greatly improves consistency of individual stylized frames compared to a scenario where diffusion is applied to each video frame separately. We conducted a number of user experiments in which we found our approach to be particularly useful in video conference scenarios enabling participants to interactively apply different visual styles to themselves (or to each other) to enhance the overall chatting experience.
期刊介绍:
IEEE Computer Graphics and Applications (CG&A) bridges the theory and practice of computer graphics, visualization, virtual and augmented reality, and HCI. From specific algorithms to full system implementations, CG&A offers a unique combination of peer-reviewed feature articles and informal departments. Theme issues guest edited by leading researchers in their fields track the latest developments and trends in computer-generated graphical content, while tutorials and surveys provide a broad overview of interesting and timely topics. Regular departments further explore the core areas of graphics as well as extend into topics such as usability, education, history, and opinion. Each issue, the story of our cover focuses on creative applications of the technology by an artist or designer. Published six times a year, CG&A is indispensable reading for people working at the leading edge of computer-generated graphics technology and its applications in everything from business to the arts.