Yingjie Tian, Minghao Liu, Haoran Jiang, Yunbin Tu, Duo Su
{"title":"SketchRefiner:通过潜在扩散模型进行文本引导的草图细化。","authors":"Yingjie Tian, Minghao Liu, Haoran Jiang, Yunbin Tu, Duo Su","doi":"10.1109/TVCG.2025.3613388","DOIUrl":null,"url":null,"abstract":"<p><p>Free-hand sketches serve as efficient tools for creativity and communication, yet expressing ideas clearly through sketches remains challenging for untrained individuals. Optimizing sketches through text guidance can enhance individuals' ability to effectively convey their ideas and improve overall communication efficiency. While recent advancements in Artificial Intelligence Generated Content (AIGC) have been notable, research on optimizing free-hand sketches remains relatively unexplored. In this paper, we introduce SketchRefiner, an innovative method designed to refine rough sketches from various categories into polished versions guided by text prompts. SketchRefiner utilizes a latent diffusion model with ControlNet to guide a differentiable rasterizer in optimizing a set of Bézier curves. We extend the score distillation sampling (SDS) loss and introduce a joint semantic loss to encourage sketches aligned with given text prompts and free-hand sketches. Additionally, we propose a fusion attention-map stroke initialization strategy to improve the quality of refined sketches. Furthermore, SketchRefiner provides users with fine-grained control over text guidance. Through extensive experiments, we demonstrate that our method can generate accurate and aesthetically pleasing refined sketches that closely align with input text prompts and sketches.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":6.5000,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SketchRefiner: Text-Guided Sketch Refinement Through Latent Diffusion Models.\",\"authors\":\"Yingjie Tian, Minghao Liu, Haoran Jiang, Yunbin Tu, Duo Su\",\"doi\":\"10.1109/TVCG.2025.3613388\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Free-hand sketches serve as efficient tools for creativity and communication, yet expressing ideas clearly through sketches remains challenging for untrained individuals. Optimizing sketches through text guidance can enhance individuals' ability to effectively convey their ideas and improve overall communication efficiency. While recent advancements in Artificial Intelligence Generated Content (AIGC) have been notable, research on optimizing free-hand sketches remains relatively unexplored. In this paper, we introduce SketchRefiner, an innovative method designed to refine rough sketches from various categories into polished versions guided by text prompts. SketchRefiner utilizes a latent diffusion model with ControlNet to guide a differentiable rasterizer in optimizing a set of Bézier curves. We extend the score distillation sampling (SDS) loss and introduce a joint semantic loss to encourage sketches aligned with given text prompts and free-hand sketches. Additionally, we propose a fusion attention-map stroke initialization strategy to improve the quality of refined sketches. Furthermore, SketchRefiner provides users with fine-grained control over text guidance. Through extensive experiments, we demonstrate that our method can generate accurate and aesthetically pleasing refined sketches that closely align with input text prompts and sketches.</p>\",\"PeriodicalId\":94035,\"journal\":{\"name\":\"IEEE transactions on visualization and computer graphics\",\"volume\":\"PP \",\"pages\":\"\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2025-09-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on visualization and computer graphics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TVCG.2025.3613388\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on visualization and computer graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TVCG.2025.3613388","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SketchRefiner: Text-Guided Sketch Refinement Through Latent Diffusion Models.
Free-hand sketches serve as efficient tools for creativity and communication, yet expressing ideas clearly through sketches remains challenging for untrained individuals. Optimizing sketches through text guidance can enhance individuals' ability to effectively convey their ideas and improve overall communication efficiency. While recent advancements in Artificial Intelligence Generated Content (AIGC) have been notable, research on optimizing free-hand sketches remains relatively unexplored. In this paper, we introduce SketchRefiner, an innovative method designed to refine rough sketches from various categories into polished versions guided by text prompts. SketchRefiner utilizes a latent diffusion model with ControlNet to guide a differentiable rasterizer in optimizing a set of Bézier curves. We extend the score distillation sampling (SDS) loss and introduce a joint semantic loss to encourage sketches aligned with given text prompts and free-hand sketches. Additionally, we propose a fusion attention-map stroke initialization strategy to improve the quality of refined sketches. Furthermore, SketchRefiner provides users with fine-grained control over text guidance. Through extensive experiments, we demonstrate that our method can generate accurate and aesthetically pleasing refined sketches that closely align with input text prompts and sketches.