Automated Visual Story Synthesis with Character Trait Control

Artificial Intelligence and Social Computing Pub Date : 1900-01-01 DOI:10.54941/ahfe1003275

Yuetian Chen, Bowen Shi, Peiru Liu, Ruohua Li, Mei Si

{"title":"Automated Visual Story Synthesis with Character Trait Control","authors":"Yuetian Chen, Bowen Shi, Peiru Liu, Ruohua Li, Mei Si","doi":"10.54941/ahfe1003275","DOIUrl":null,"url":null,"abstract":"Visual storytelling is an art form that has been utilized for centuries to communicate stories, convey messages, and evoke emotions. The images and text must be used in harmony to create a compelling narrative experience. With the rise of text-to-image generation models such as Stable Diffusion, it is becoming more promising to investigate methods of automatically creating illustrations for stories. However, these diffusion models are usually developed to generate a single image, resulting in a lack of consistency be- tween figures and objects across different illustrations of the same story, which is especially important in stories with human characters.This work introduces a novel technique for creating consistent human figures in visual stories. This is achieved in two steps. The first step is to collect human portraits with various identifying characteristics, such as gender and age, that describe the character. The second step is to use this collection to train DreamBooth to generate a unique token ID for each type of character. These IDs can then be used to replace the names of the story characters in the image-generation process. By combining these two steps, we can create controlled human figures for various visual storytelling contexts.","PeriodicalId":405313,"journal":{"name":"Artificial Intelligence and Social Computing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence and Social Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54941/ahfe1003275","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Visual storytelling is an art form that has been utilized for centuries to communicate stories, convey messages, and evoke emotions. The images and text must be used in harmony to create a compelling narrative experience. With the rise of text-to-image generation models such as Stable Diffusion, it is becoming more promising to investigate methods of automatically creating illustrations for stories. However, these diffusion models are usually developed to generate a single image, resulting in a lack of consistency be- tween figures and objects across different illustrations of the same story, which is especially important in stories with human characters.This work introduces a novel technique for creating consistent human figures in visual stories. This is achieved in two steps. The first step is to collect human portraits with various identifying characteristics, such as gender and age, that describe the character. The second step is to use this collection to train DreamBooth to generate a unique token ID for each type of character. These IDs can then be used to replace the names of the story characters in the image-generation process. By combining these two steps, we can create controlled human figures for various visual storytelling contexts.

查看原文本刊更多论文

自动视觉故事合成与角色特征控制

视觉叙事是一种艺术形式，几个世纪以来一直被用于交流故事、传达信息和唤起情感。图像和文本必须和谐地使用，以创造引人注目的叙事体验。随着诸如Stable Diffusion之类的文本到图像生成模型的兴起，研究自动为故事创建插图的方法变得越来越有前途。然而，这些扩散模型通常用于生成单一图像，导致同一故事的不同插图中的人物和物体之间缺乏一致性，这在有人物角色的故事中尤为重要。这项工作介绍了一种在视觉故事中创造一致的人物形象的新技术。这可以通过两个步骤实现。第一步是收集具有各种识别特征的人体肖像，如性别和年龄，以描述人物。第二步是使用这个集合训练DreamBooth为每种类型的字符生成唯一的令牌ID。然后可以使用这些id来替换图像生成过程中故事角色的名称。通过结合这两个步骤，我们可以为各种视觉叙事环境创建受控的人物。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Artificial Intelligence and Social Computing

自引率

0.00%

发文量