Proceedings of the 24th ACM international conference on Multimedia最新文献_第3页

LSOD: Local Sparse Orthogonal Descriptor for Image Matching LSOD:图像匹配的局部稀疏正交描述子

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967217

Yiru Zhao, Yaoyi Li, Zhiwen Shao, Hongtao Lu

引用次数: 3

Accelerating Convolutional Neural Networks for Mobile Applications 为移动应用加速卷积神经网络

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967280

Peisong Wang, Jian Cheng

引用次数: 65

Share-and-Chat: Achieving Human-Level Video Commenting by Search and Multi-View Embedding 分享和聊天:通过搜索和多视图嵌入实现人性化视频评论

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2964320

Yehao Li, Ting Yao, Tao Mei, Hongyang Chao, Y. Rui

{"title":"Share-and-Chat: Achieving Human-Level Video Commenting by Search and Multi-View Embedding","authors":"Yehao Li, Ting Yao, Tao Mei, Hongyang Chao, Y. Rui","doi":"10.1145/2964284.2964320","DOIUrl":"https://doi.org/10.1145/2964284.2964320","url":null,"abstract":"Video has become a predominant social media for the booming live interactions. Automatic generation of emotional comments to a video has great potential to significantly increase user engagement in many socio-video applications (e.g., chat bot). Nevertheless, the problem of video commenting has been overlooked by the research community. The major challenges are that the generated comments are to be not only as natural as those from human beings, but also relevant to the video content. We present in this paper a novel two-stage deep learning-based approach to automatic video commenting. Our approach consists of two components. The first component, similar video search, efficiently finds the visually similar videos w.r.t. a given video using approximate nearest-neighbor search based on the learned deep video representations, while the second dynamic ranking effectively ranks the comments associated with the searched similar videos by learning a deep multi-view embedding space. For modeling the emotional view of videos, we incorporate visual sentiment, video content, and text comments into the learning of the embedding space. On a newly collected dataset with over 102K videos and 10.6M comments, we demonstrate that our approach outperforms several state-of-the-art methods and achieves human-level video commenting.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115448761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Online Weighted Clustering for Real-time Abnormal Event Detection in Video Surveillance 视频监控中实时异常事件检测的在线加权聚类

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967279

Hanhe Lin, Jeremiah D. Deng, B. Woodford, Ahmad Shahi

引用次数: 20

DRIVING: Distributed Scheduling for Video Streaming in Vehicular Wi-Fi Systems 驾驶:车载Wi-Fi系统中视频流的分布式调度

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2964290

X. Chen, Lei Rao, Qiao Xiang, Xue Liu, F. Bai

{"title":"DRIVING: Distributed Scheduling for Video Streaming in Vehicular Wi-Fi Systems","authors":"X. Chen, Lei Rao, Qiao Xiang, Xue Liu, F. Bai","doi":"10.1145/2964284.2964290","DOIUrl":"https://doi.org/10.1145/2964284.2964290","url":null,"abstract":"Video streaming has been dominating the mobile bandwidth, and is still expanding drastically. Its tremendous economic benefits have driven the automobile industry to equip vehicles with video streaming capacity. As a result, the new in-cabin Wi-Fi systems have been deployed, enabling each vehicle as a streaming hotspot on the wheels. A built-in Access Point (AP) bridges the communications between Wi-Fi devices inside and cellular networks outside. Distinct advantages offered by this system include a more powerful antenna array to improve multimedia quality, a constant energy source to power the streaming, etc. However, there exist two challenging features that may jeopardize the system performance. (1) The in-cabin Wi-Fi hotspots are mostly deployed on private vehicles, and thus are completely decentralized. (2) Video packets need to be delivered before their deadlines with small delays. Due to these features, existing algorithms may fail to efficiently schedule the in-cabin Wi-Fi video streaming. To fill the gap, we propose the Delay-awaRe dIstributed Video schedulING (DRIVING) framework. Being fully distributed and delay-aware, DRIVING not only increases the streaming goodput, but also reduces the delivery latency and deadline missing ratio. %In order to optimize this new framework, we establish cross-layer analytical models, which help us tune the framework parameters for better performance. In a typical scenario, DRIVING increases the goodput by up to 27.0%, while reducing the queueing delay and the deadline missing ratio by up to 40.0% and 38.4%, respectively.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117262211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Video eCommerce: Towards Online Video Advertising 视频电子商务:走向在线视频广告

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2964326

Zhi-Qi Cheng, Yang Liu, Xiao Wu, Xiansheng Hua

{"title":"Video eCommerce: Towards Online Video Advertising","authors":"Zhi-Qi Cheng, Yang Liu, Xiao Wu, Xiansheng Hua","doi":"10.1145/2964284.2964326","DOIUrl":"https://doi.org/10.1145/2964284.2964326","url":null,"abstract":"The prevalence of online videos provides an opportunity for e-commerce companies to exhibit their product ads in videos by recommendation. In this paper, we propose an advertising system named Video eCommerce to exhibit appropriate product ads to particular users at proper time stamps of videos, which takes into account video semantics, user shopping preference and viewing behavior feedback by a two-level strategy. At the first level, Co-Relation Regression (CRR) model is novelly proposed to construct the semantic association between keyframes and products. Heterogeneous information network (HIN) is adopted to build the user shopping preference from two different e-commerce platforms, Tmall and MagicBox, which alleviates the problems of data sparsity and cold start. In addition, Video Scene Importance Model (VSIM) utilizes the viewing behavior of users to embed ads at the most attractive position within the video stream. At the second level, taking the results of CRR, HIN and VSIM as the input, Heterogeneous Relation Matrix Factorization (HRMF) is applied for product advertising. Extensive evaluation on a variety of online videos from Tmall MagicBox demonstrates that Video eCommerce achieves promising performance, which significantly outperforms the state-of-the-art advertising methods.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"30 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125803055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

The Lifecycle of Geotagged Multimedia Data 地理标记多媒体数据的生命周期

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2986911

R. Schifanella, B. Thomee

{"title":"The Lifecycle of Geotagged Multimedia Data","authors":"R. Schifanella, B. Thomee","doi":"10.1145/2964284.2986911","DOIUrl":"https://doi.org/10.1145/2964284.2986911","url":null,"abstract":"The world is a big place. At any given instant something is happening somewhere, but even when nothing is going on people still find ways to generate multimedia data, ranging from social media posts, to photos and videos. A substantial number of these media objects is associated with a location, and in an increasingly mobile and connected world (both in terms of people and devices), this number is only bound to get larger. Yet, in the multimedia literature we observe that many researchers often unwittingly treat the geospatial dimension as if it were a regular feature dimension, despite it requiring special attention. In order to avoid pitfalls and to steer clear of erroneous conclusions, this tutorial aims to teach researchers and students how geotagged multimedia data differs from regular data and to educate them on best practices when dealing with such data. We will cover the lifecycle of geotagged data in multimedia research, where the topics range from how this kind of data is represented, processed, analyzed, and visualized. The tutorial requires both passive and active involvement, where we not only present the material, but the attendees also get the opportunity to interact with it using a variety of open source data and tools that we have prepared using a virtual machine.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125052106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Deeply-Supervised Deconvolutional Network for Horizon Line Detection 一种用于地平线检测的深度监督反卷积网络

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967198

L. Porzi, S. R. Bulò, E. Ricci

引用次数: 13

MatchDR: Image Correspondence by Leveraging Distance Ratio Constraint MatchDR:利用距离比约束的图像对应

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967293

Rui Wang, Dong Liang, Wei Zhang, Xiaochun Cao

引用次数: 1

Shorter-is-Better: Venue Category Estimation from Micro-Video 越短越好:基于微视频的场地类别估算

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2964307

Jianglong Zhang, Liqiang Nie, Xiang Wang, Xiangnan He, Xianglin Huang, Tat-Seng Chua

{"title":"Shorter-is-Better: Venue Category Estimation from Micro-Video","authors":"Jianglong Zhang, Liqiang Nie, Xiang Wang, Xiangnan He, Xianglin Huang, Tat-Seng Chua","doi":"10.1145/2964284.2964307","DOIUrl":"https://doi.org/10.1145/2964284.2964307","url":null,"abstract":"According to our statistics on over 2 million micro-videos, only 1.22% of them are associated with venue information, which greatly hinders the location-oriented applications and personalized services. To alleviate this problem, we aim to label the bite-sized video clips with venue categories. It is, however, nontrivial due to three reasons: 1) no available benchmark dataset; 2) insufficient information, low quality, and 3) information loss; and 3) complex relatedness among venue categories. Towards this end, we propose a scheme comprising of two components. In particular, we first crawl a representative set of micro-videos from Vine and extract a rich set of features from textual, visual and acoustic modalities. We then, in the second component, build a tree-guided multi-task multi-modal learning model to estimate the venue category for each unseen micro-video. This model is able to jointly learn a common space from multi-modalities and leverage the predefined Foursquare hierarchical structure to regularize the relatedness among venue categories. Extensive experiments have well-validated our model. As a side research contribution, we have released our data, codes and involved parameters.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127213510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 60