学习叙事多媒体的多模态表现:以网络漫画为例

Proceedings of the International Conference on Research in Adaptive and Convergent Systems Pub Date : 2020-10-13 DOI:10.1145/3400286.3418216

O-Joun Lee, Jin-Taek Kim

{"title":"学习叙事多媒体的多模态表现:以网络漫画为例","authors":"O-Joun Lee, Jin-Taek Kim","doi":"10.1145/3400286.3418216","DOIUrl":null,"url":null,"abstract":"This study aims to learn task-agnostic representations of narrative multimedia. The existing studies focused on only stories in the narrative multimedia without considering their physical features. We propose a method for incorporating multi-modal features of the narrative multimedia into a unified vector representation. For narrative features, we embed character networks as with the existing studies. Textual features can be represented using the LSTM (Long-Short Term Memory) autoencoder. We apply the convolutional autoencoder to visual features. The convolutional autoencoder also can be used for the spectrograms of audible features. To combine these features, we propose two methods: early fusion and late fusion. The early fusion method composes representations of features on each scene. Then, we learn representations of a narrative work by predicting time-sequential changes in the features. The late fusion method concatenates feature vectors that are trained for allover the narrative work. Finally, we apply the proposed methods on webtoons (i.e., comics that are serially published through the web). The proposed methods have been evaluated by applying the vector representations to predicting the preferences of users for the webtoons.","PeriodicalId":326100,"journal":{"name":"Proceedings of the International Conference on Research in Adaptive and Convergent Systems","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Learning Multi-modal Representations of Narrative Multimedia: a Case Study of Webtoons\",\"authors\":\"O-Joun Lee, Jin-Taek Kim\",\"doi\":\"10.1145/3400286.3418216\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study aims to learn task-agnostic representations of narrative multimedia. The existing studies focused on only stories in the narrative multimedia without considering their physical features. We propose a method for incorporating multi-modal features of the narrative multimedia into a unified vector representation. For narrative features, we embed character networks as with the existing studies. Textual features can be represented using the LSTM (Long-Short Term Memory) autoencoder. We apply the convolutional autoencoder to visual features. The convolutional autoencoder also can be used for the spectrograms of audible features. To combine these features, we propose two methods: early fusion and late fusion. The early fusion method composes representations of features on each scene. Then, we learn representations of a narrative work by predicting time-sequential changes in the features. The late fusion method concatenates feature vectors that are trained for allover the narrative work. Finally, we apply the proposed methods on webtoons (i.e., comics that are serially published through the web). The proposed methods have been evaluated by applying the vector representations to predicting the preferences of users for the webtoons.\",\"PeriodicalId\":326100,\"journal\":{\"name\":\"Proceedings of the International Conference on Research in Adaptive and Convergent Systems\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International Conference on Research in Adaptive and Convergent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3400286.3418216\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on Research in Adaptive and Convergent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3400286.3418216","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

本研究旨在了解叙事多媒体的任务不可知论表征。现有的研究只关注叙事多媒体中的故事，没有考虑故事的物理特征。我们提出了一种将叙事多媒体的多模态特征纳入统一矢量表示的方法。对于叙事特征，我们像现有的研究一样嵌入角色网络。文本特征可以使用LSTM(长短期记忆)自动编码器来表示。我们将卷积自编码器应用于视觉特征。卷积自编码器也可用于声音特征的谱图。为了结合这些特点，我们提出了两种方法:早期融合和晚期融合。早期的融合方法由每个场景的特征表示组成。然后，我们通过预测特征的时间顺序变化来学习叙事作品的表征。后期融合方法将为整个叙事工作训练的特征向量连接在一起。最后，我们将提出的方法应用于网络漫画(即通过网络连续发布的漫画)。通过应用向量表示来预测用户对网络漫画的偏好，对所提出的方法进行了评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning Multi-modal Representations of Narrative Multimedia: a Case Study of Webtoons

This study aims to learn task-agnostic representations of narrative multimedia. The existing studies focused on only stories in the narrative multimedia without considering their physical features. We propose a method for incorporating multi-modal features of the narrative multimedia into a unified vector representation. For narrative features, we embed character networks as with the existing studies. Textual features can be represented using the LSTM (Long-Short Term Memory) autoencoder. We apply the convolutional autoencoder to visual features. The convolutional autoencoder also can be used for the spectrograms of audible features. To combine these features, we propose two methods: early fusion and late fusion. The early fusion method composes representations of features on each scene. Then, we learn representations of a narrative work by predicting time-sequential changes in the features. The late fusion method concatenates feature vectors that are trained for allover the narrative work. Finally, we apply the proposed methods on webtoons (i.e., comics that are serially published through the web). The proposed methods have been evaluated by applying the vector representations to predicting the preferences of users for the webtoons.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the International Conference on Research in Adaptive and Convergent Systems

自引率

0.00%

发文量