Share-and-Chat: Achieving Human-Level Video Commenting by Search and Multi-View Embedding

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI:10.1145/2964284.2964320

Yehao Li, Ting Yao, Tao Mei, Hongyang Chao, Y. Rui

{"title":"Share-and-Chat: Achieving Human-Level Video Commenting by Search and Multi-View Embedding","authors":"Yehao Li, Ting Yao, Tao Mei, Hongyang Chao, Y. Rui","doi":"10.1145/2964284.2964320","DOIUrl":null,"url":null,"abstract":"Video has become a predominant social media for the booming live interactions. Automatic generation of emotional comments to a video has great potential to significantly increase user engagement in many socio-video applications (e.g., chat bot). Nevertheless, the problem of video commenting has been overlooked by the research community. The major challenges are that the generated comments are to be not only as natural as those from human beings, but also relevant to the video content. We present in this paper a novel two-stage deep learning-based approach to automatic video commenting. Our approach consists of two components. The first component, similar video search, efficiently finds the visually similar videos w.r.t. a given video using approximate nearest-neighbor search based on the learned deep video representations, while the second dynamic ranking effectively ranks the comments associated with the searched similar videos by learning a deep multi-view embedding space. For modeling the emotional view of videos, we incorporate visual sentiment, video content, and text comments into the learning of the embedding space. On a newly collected dataset with over 102K videos and 10.6M comments, we demonstrate that our approach outperforms several state-of-the-art methods and achieves human-level video commenting.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 24th ACM international conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2964284.2964320","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

Abstract

Video has become a predominant social media for the booming live interactions. Automatic generation of emotional comments to a video has great potential to significantly increase user engagement in many socio-video applications (e.g., chat bot). Nevertheless, the problem of video commenting has been overlooked by the research community. The major challenges are that the generated comments are to be not only as natural as those from human beings, but also relevant to the video content. We present in this paper a novel two-stage deep learning-based approach to automatic video commenting. Our approach consists of two components. The first component, similar video search, efficiently finds the visually similar videos w.r.t. a given video using approximate nearest-neighbor search based on the learned deep video representations, while the second dynamic ranking effectively ranks the comments associated with the searched similar videos by learning a deep multi-view embedding space. For modeling the emotional view of videos, we incorporate visual sentiment, video content, and text comments into the learning of the embedding space. On a newly collected dataset with over 102K videos and 10.6M comments, we demonstrate that our approach outperforms several state-of-the-art methods and achieves human-level video commenting.

查看原文本刊更多论文

分享和聊天:通过搜索和多视图嵌入实现人性化视频评论

视频已经成为蓬勃发展的实时互动的主要社交媒体。在许多社交视频应用(如聊天机器人)中，对视频自动生成情感评论具有显著提高用户参与度的巨大潜力。然而，视频评论的问题一直被研究界所忽视。主要的挑战是生成的评论不仅要像人类评论一样自然，而且要与视频内容相关。本文提出了一种新的基于深度学习的两阶段自动视频评论方法。我们的方法由两个部分组成。第一个组件，相似视频搜索，基于学习到的深度视频表示，使用近似最近邻搜索，有效地找到视觉上相似的视频，而第二个动态排名通过学习深度多视图嵌入空间，有效地对与搜索到的相似视频相关的评论进行排名。为了建模视频的情感视图，我们将视觉情感、视频内容和文本评论纳入嵌入空间的学习中。在新收集的超过102K视频和1060万条评论的数据集上，我们证明了我们的方法优于几种最先进的方法，并达到了人类级别的视频评论。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 24th ACM international conference on Multimedia

自引率

0.00%

发文量