Samee Arif, Taimoor Arif, Aamina Jamal Khan, Muhammad Saad Haroon, Agha Ali Raza, Awais Athar
{"title":"The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives","authors":"Samee Arif, Taimoor Arif, Aamina Jamal Khan, Muhammad Saad Haroon, Agha Ali Raza, Awais Athar","doi":"arxiv-2409.11261","DOIUrl":null,"url":null,"abstract":"This paper introduces the concept of an education tool that utilizes\nGenerative Artificial Intelligence (GenAI) to enhance storytelling for\nchildren. The system combines GenAI-driven narrative co-creation,\ntext-to-speech conversion, and text-to-video generation to produce an engaging\nexperience for learners. We describe the co-creation process, the adaptation of\nnarratives into spoken words using text-to-speech models, and the\ntransformation of these narratives into contextually relevant visuals through\ntext-to-video technology. Our evaluation covers the linguistics of the\ngenerated stories, the text-to-speech conversion quality, and the accuracy of\nthe generated visuals.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"188 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11261","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper introduces the concept of an education tool that utilizes
Generative Artificial Intelligence (GenAI) to enhance storytelling for
children. The system combines GenAI-driven narrative co-creation,
text-to-speech conversion, and text-to-video generation to produce an engaging
experience for learners. We describe the co-creation process, the adaptation of
narratives into spoken words using text-to-speech models, and the
transformation of these narratives into contextually relevant visuals through
text-to-video technology. Our evaluation covers the linguistics of the
generated stories, the text-to-speech conversion quality, and the accuracy of
the generated visuals.