微观告诉宏观:通过转换模型预测微视频的流行

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI:10.1145/2964284.2964314

Jingyuan Chen, Xuemeng Song, Liqiang Nie, Xiang Wang, Hanwang Zhang, Tat-Seng Chua

{"title":"微观告诉宏观:通过转换模型预测微视频的流行","authors":"Jingyuan Chen, Xuemeng Song, Liqiang Nie, Xiang Wang, Hanwang Zhang, Tat-Seng Chua","doi":"10.1145/2964284.2964314","DOIUrl":null,"url":null,"abstract":"Micro-videos, a new form of user generated contents (UGCs), are gaining increasing enthusiasm. Popular micro-videos have enormous commercial potential in many ways, such as online marketing and brand tracking. In fact, the popularity prediction of traditional UGCs including tweets, web images, and long videos, has achieved good theoretical underpinnings and great practical success. However, little research has thus far been conducted to predict the popularity of the bite-sized videos. This task is non-trivial due to three reasons: 1) micro-videos are short in duration and of low quality; 2) they can be described by multiple heterogeneous channels, spanning from social, visual, acoustic to textual modalities; and 3) there are no available benchmark dataset and discriminant features that are suitable for this task. Towards this end, we present a transductive multi-modal learning model. The proposed model is designed to find the optimal latent common space, unifying and preserving information from different modalities, whereby micro-videos can be better represented. This latent space can be used to alleviate the information insufficiency problem caused by the brief nature of micro-videos. In addition, we built a benchmark dataset and extracted a rich set of popularity-oriented features to characterize the popular micro-videos. Extensive experiments have demonstrated the effectiveness of the proposed model. As a side contribution, we have released the dataset, codes and parameters to facilitate other researchers.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"131","resultStr":"{\"title\":\"Micro Tells Macro: Predicting the Popularity of Micro-Videos via a Transductive Model\",\"authors\":\"Jingyuan Chen, Xuemeng Song, Liqiang Nie, Xiang Wang, Hanwang Zhang, Tat-Seng Chua\",\"doi\":\"10.1145/2964284.2964314\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Micro-videos, a new form of user generated contents (UGCs), are gaining increasing enthusiasm. Popular micro-videos have enormous commercial potential in many ways, such as online marketing and brand tracking. In fact, the popularity prediction of traditional UGCs including tweets, web images, and long videos, has achieved good theoretical underpinnings and great practical success. However, little research has thus far been conducted to predict the popularity of the bite-sized videos. This task is non-trivial due to three reasons: 1) micro-videos are short in duration and of low quality; 2) they can be described by multiple heterogeneous channels, spanning from social, visual, acoustic to textual modalities; and 3) there are no available benchmark dataset and discriminant features that are suitable for this task. Towards this end, we present a transductive multi-modal learning model. The proposed model is designed to find the optimal latent common space, unifying and preserving information from different modalities, whereby micro-videos can be better represented. This latent space can be used to alleviate the information insufficiency problem caused by the brief nature of micro-videos. In addition, we built a benchmark dataset and extracted a rich set of popularity-oriented features to characterize the popular micro-videos. Extensive experiments have demonstrated the effectiveness of the proposed model. As a side contribution, we have released the dataset, codes and parameters to facilitate other researchers.\",\"PeriodicalId\":140670,\"journal\":{\"name\":\"Proceedings of the 24th ACM international conference on Multimedia\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"131\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 24th ACM international conference on Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2964284.2964314\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 24th ACM international conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2964284.2964314","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 131

摘要

微视频作为一种用户生成内容(UGCs)的新形式，正获得越来越多的热情。热门微视频在网络营销、品牌追踪等诸多方面都具有巨大的商业潜力。事实上，传统UGCs(推文、网页图片、长视频)的人气预测已经取得了很好的理论基础和很大的实践成功。然而，到目前为止，几乎没有研究预测这种小视频的受欢迎程度。这一任务并不简单，原因有三:1)微视频时长短，质量不高;2)它们可以通过多种异构渠道进行描述，从社会、视觉、声学到文本形式;3)没有可用的基准数据集和判别特征适合于这项任务。为此，我们提出了一种可转换的多模态学习模型。该模型旨在寻找最优的潜在公共空间，统一和保留来自不同模态的信息，从而更好地表示微视频。这一潜在空间可以用来缓解微视频的简短性所带来的信息不足问题。此外，我们建立了一个基准数据集，并提取了一组丰富的面向流行的特征来表征流行的微视频。大量的实验证明了该模型的有效性。作为附带贡献，我们已经发布了数据集，代码和参数，以方便其他研究人员。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Micro Tells Macro: Predicting the Popularity of Micro-Videos via a Transductive Model

Micro-videos, a new form of user generated contents (UGCs), are gaining increasing enthusiasm. Popular micro-videos have enormous commercial potential in many ways, such as online marketing and brand tracking. In fact, the popularity prediction of traditional UGCs including tweets, web images, and long videos, has achieved good theoretical underpinnings and great practical success. However, little research has thus far been conducted to predict the popularity of the bite-sized videos. This task is non-trivial due to three reasons: 1) micro-videos are short in duration and of low quality; 2) they can be described by multiple heterogeneous channels, spanning from social, visual, acoustic to textual modalities; and 3) there are no available benchmark dataset and discriminant features that are suitable for this task. Towards this end, we present a transductive multi-modal learning model. The proposed model is designed to find the optimal latent common space, unifying and preserving information from different modalities, whereby micro-videos can be better represented. This latent space can be used to alleviate the information insufficiency problem caused by the brief nature of micro-videos. In addition, we built a benchmark dataset and extracted a rich set of popularity-oriented features to characterize the popular micro-videos. Extensive experiments have demonstrated the effectiveness of the proposed model. As a side contribution, we have released the dataset, codes and parameters to facilitate other researchers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 24th ACM international conference on Multimedia

自引率

0.00%

发文量