Samee Arif, Taimoor Arif, Aamina Jamal Khan, Muhammad Saad Haroon, Agha Ali Raza, Awais Athar
{"title":"讲故事的艺术:用于动态多模态叙事的多代理生成式人工智能","authors":"Samee Arif, Taimoor Arif, Aamina Jamal Khan, Muhammad Saad Haroon, Agha Ali Raza, Awais Athar","doi":"arxiv-2409.11261","DOIUrl":null,"url":null,"abstract":"This paper introduces the concept of an education tool that utilizes\nGenerative Artificial Intelligence (GenAI) to enhance storytelling for\nchildren. The system combines GenAI-driven narrative co-creation,\ntext-to-speech conversion, and text-to-video generation to produce an engaging\nexperience for learners. We describe the co-creation process, the adaptation of\nnarratives into spoken words using text-to-speech models, and the\ntransformation of these narratives into contextually relevant visuals through\ntext-to-video technology. Our evaluation covers the linguistics of the\ngenerated stories, the text-to-speech conversion quality, and the accuracy of\nthe generated visuals.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"188 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives\",\"authors\":\"Samee Arif, Taimoor Arif, Aamina Jamal Khan, Muhammad Saad Haroon, Agha Ali Raza, Awais Athar\",\"doi\":\"arxiv-2409.11261\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper introduces the concept of an education tool that utilizes\\nGenerative Artificial Intelligence (GenAI) to enhance storytelling for\\nchildren. The system combines GenAI-driven narrative co-creation,\\ntext-to-speech conversion, and text-to-video generation to produce an engaging\\nexperience for learners. We describe the co-creation process, the adaptation of\\nnarratives into spoken words using text-to-speech models, and the\\ntransformation of these narratives into contextually relevant visuals through\\ntext-to-video technology. Our evaluation covers the linguistics of the\\ngenerated stories, the text-to-speech conversion quality, and the accuracy of\\nthe generated visuals.\",\"PeriodicalId\":501030,\"journal\":{\"name\":\"arXiv - CS - Computation and Language\",\"volume\":\"188 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computation and Language\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11261\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11261","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives
This paper introduces the concept of an education tool that utilizes
Generative Artificial Intelligence (GenAI) to enhance storytelling for
children. The system combines GenAI-driven narrative co-creation,
text-to-speech conversion, and text-to-video generation to produce an engaging
experience for learners. We describe the co-creation process, the adaptation of
narratives into spoken words using text-to-speech models, and the
transformation of these narratives into contextually relevant visuals through
text-to-video technology. Our evaluation covers the linguistics of the
generated stories, the text-to-speech conversion quality, and the accuracy of
the generated visuals.