{"title":"Image-Based Storytelling Using Deep Learning","authors":"Yulin Zhu, Wei Yan","doi":"10.1145/3561613.3561641","DOIUrl":null,"url":null,"abstract":"In order to describe a journey, a story could be automatically generated from a group of digital photographs. Most of the existing methods focus on descriptions of specific content of a single image, such as image captioning, which lack of correlation between the images and the spatiotemporal relationships. To this end, in this paper, our goal is to propose a novel storytelling architecture based on computer vision. It makes use of visual object detection from digital images. Combining the changes in spatiotemporal domain and filling in the predetermined template, we automatically generate a text-based travel diary. In this project, compared with conventional image captioning, our aims are to effectively connect correlation between digital images and background information. The contributions of this paper are: (1) Innovative use of preset templates to generate travel diaries from photographs, associating content and context of the images as an event, (3) augmenting the images to expand the dataset, (4) shortening training time of deep learning models.","PeriodicalId":348024,"journal":{"name":"Proceedings of the 5th International Conference on Control and Computer Vision","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th International Conference on Control and Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3561613.3561641","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In order to describe a journey, a story could be automatically generated from a group of digital photographs. Most of the existing methods focus on descriptions of specific content of a single image, such as image captioning, which lack of correlation between the images and the spatiotemporal relationships. To this end, in this paper, our goal is to propose a novel storytelling architecture based on computer vision. It makes use of visual object detection from digital images. Combining the changes in spatiotemporal domain and filling in the predetermined template, we automatically generate a text-based travel diary. In this project, compared with conventional image captioning, our aims are to effectively connect correlation between digital images and background information. The contributions of this paper are: (1) Innovative use of preset templates to generate travel diaries from photographs, associating content and context of the images as an event, (3) augmenting the images to expand the dataset, (4) shortening training time of deep learning models.