Proceedings of the 26th ACM international conference on Multimedia最新文献

筛选
英文 中文
Comprehensive Distance-Preserving Autoencoders for Cross-Modal Retrieval 跨模态检索的综合距离保持自编码器
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240607
Yibing Zhan, Jun Yu, Zhou Yu, Rong Zhang, D. Tao, Qi Tian
{"title":"Comprehensive Distance-Preserving Autoencoders for Cross-Modal Retrieval","authors":"Yibing Zhan, Jun Yu, Zhou Yu, Rong Zhang, D. Tao, Qi Tian","doi":"10.1145/3240508.3240607","DOIUrl":"https://doi.org/10.1145/3240508.3240607","url":null,"abstract":"In this paper, we propose a novel method with comprehensive distance-preserving autoencoders (CDPAE) to address the problem of unsupervised cross-modal retrieval. Previous unsupervised methods rely primarily on pairwise distances of representations extracted from cross media spaces that co-occur and belong to the same objects. However, besides pairwise distances, the CDPAE also considers heterogeneous distances of representations extracted from cross media spaces as well as homogeneous distances of representations extracted from single media spaces that belong to different objects. The CDPAE consists of four components. First, denoising autoencoders are used to retain the information from the representations and to reduce the negative influence of redundant noises. Second, a comprehensive distance-preserving common space is proposed to explore the correlations among different representations. This aims to preserve the respective distances between the representations within the common space so that they are consistent with the distances in their original media spaces. Third, a novel joint loss function is defined to simultaneously calculate the reconstruction loss of the denoising autoencoders and the correlation loss of the comprehensive distance-preserving common space. Finally, an unsupervised cross-modal similarity measurement is proposed to further improve the retrieval performance. This is carried out by calculating the marginal probability of two media objects based on a kNN classifier. The CDPAE is tested on four public datasets with two cross-modal retrieval tasks: \"query images by texts\" and \"query texts by images\". Compared with eight state-of-the-art cross-modal retrieval methods, the experimental results demonstrate that the CDPAE outperforms all the unsupervised methods and performs competitively with the supervised methods.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125259918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Fast and Light Manifold CNN based 3D Facial Expression Recognition across Pose Variations 快速和轻流形CNN基于三维面部表情识别跨姿态变化
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240568
Zhixing Chen, Di Huang, Yunhong Wang, Liming Chen
{"title":"Fast and Light Manifold CNN based 3D Facial Expression Recognition across Pose Variations","authors":"Zhixing Chen, Di Huang, Yunhong Wang, Liming Chen","doi":"10.1145/3240508.3240568","DOIUrl":"https://doi.org/10.1145/3240508.3240568","url":null,"abstract":"This paper proposes a novel approach to 3D Facial Expression Recognition (FER), and it is based on a Fast and Light Manifold CNN model, namely FLM-CNN. Different from current manifold CNNs, FLM-CNN adopts a human vision inspired pooling structure and a multi-scale encoding strategy to enhance geometry representation, which highlights shape characteristics of expressions and runs efficiently. Furthermore, a sampling tree based preprocessing method is presented, and it sharply saves memory when applied to 3D facial surfaces, without much information loss of original data. More importantly, due to the property of manifold CNN features of being rotation-invariant, the proposed method shows a high robustness to pose variations. Extensive experiments are conducted on BU-3DFE, and state-of-the-art results are achieved, indicating its effectiveness.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128121389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Content-Based Video Relevance Prediction with Second-Order Relevance and Attention Modeling 通过二阶相关性和注意力建模进行基于内容的视频相关性预测
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3266434
Xusong Chen, Rui Zhao, Shengjie Ma, Dong Liu, Zhengjun Zha
{"title":"Content-Based Video Relevance Prediction with Second-Order Relevance and Attention Modeling","authors":"Xusong Chen, Rui Zhao, Shengjie Ma, Dong Liu, Zhengjun Zha","doi":"10.1145/3240508.3266434","DOIUrl":"https://doi.org/10.1145/3240508.3266434","url":null,"abstract":"This paper describes our proposed method for the Content-Based Video Relevance Prediction (CBVRP) challenge. Our method is based on deep learning, i.e. we train a deep network to predict the relevance between two video sequences from their features. We explore the usage of second-order relevance, both in preparing training data, and in extending the deep network. Second-order relevance refers to e.g. the relevance between x and z if x is relevant to y and y is relevant to z. In our proposed method, we use second-order relevance to increase positive samples and decrease negative samples, when preparing training data. We further extend the deep network with an attention module, where the attention mechanism is designed for second-order relevant video sequences. We verify the effectiveness of our method on the validation set of the CBVRP challenge.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134179477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Examine before You Answer: Multi-task Learning with Adaptive-attentions for Multiple-choice VQA 先检查后回答:多选题VQA的多任务学习与自适应关注
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240687
Lianli Gao, Pengpeng Zeng, Jingkuan Song, Xianglong Liu, Heng Tao Shen
{"title":"Examine before You Answer: Multi-task Learning with Adaptive-attentions for Multiple-choice VQA","authors":"Lianli Gao, Pengpeng Zeng, Jingkuan Song, Xianglong Liu, Heng Tao Shen","doi":"10.1145/3240508.3240687","DOIUrl":"https://doi.org/10.1145/3240508.3240687","url":null,"abstract":"Multiple-choice (MC) Visual Question Answering (VQA) is a similar but essentially different task to open-ended VQA because the answer options are provided. Most of existing works tackle them in a unified pipeline by solving a multi-class problem to infer the best answer from a predefined answer set. The option that matches the best answer is selected for MC VQA. Nevertheless, this violates human thinking logics. Normally, people examine the questions, answer options and the reference image before inferring a MC VQA. For MC VQA, human either rely on the question and answer options to directly deduce a correct answer if the question is not image-related, or read the question and answer options and then purposefully search for answers in a reference image. Therefore, we propose a novel approach, namely Multi-task Learning with Adaptive-attention (MTA), to simulate human logics for MC VQA. Specifically, we first fuse the answer options and question features, and then adaptively attend to the visual features for inferring a MC VQA. Furthermore, we design our model as a multi-task learning architecture by integrating the open-ended VQA task to further boost the performance of MC VQA. We evaluate our approach on two standard benchmark datasets: VQA and Visual7W and our approach sets new records on both datasets for MC VQA task, reaching 73.5% and 65.9% average accuracy respectively.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133105560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
OSMO
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240548
Xu Gao, Tingting Jiang
{"title":"OSMO","authors":"Xu Gao, Tingting Jiang","doi":"10.1145/3240508.3240548","DOIUrl":"https://doi.org/10.1145/3240508.3240548","url":null,"abstract":"With demands of the intelligent monitoring, multiple object tracking (MOT) in surveillance scene has become an essential but challenging task. Occlusion is the primary difficulty in surveillance MOT, which can be categorized into the inter-object occlusion and the obstacle occlusion. Many current studies on general MOT focus on the former occlusion, but few studies have been conducted on the latter one. In fact, there are useful prior knowledge in surveillance videos, because the scene structure is fixed. Hence, we propose two models for dealing with these two kinds of occlusions. The attention-based appearance model is proposed to solve the inter-object occlusion, and the scene structure model is proposed to solve the obstacle occlusion. We also design an obstacle map segmentation method for segmenting obstacles from the surveillance scene. Furthermore, to evaluate our method, we propose four new surveillance datasets that contain videos with obstacles. Experimental results show the effectiveness of our two models.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114139642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Session details: FF-2 会话详情:FF-2
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3286917
Peng Cui
{"title":"Session details: FF-2","authors":"Peng Cui","doi":"10.1145/3286917","DOIUrl":"https://doi.org/10.1145/3286917","url":null,"abstract":"","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114359036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session details: Panel-2 会议详情:小组2
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3286937
Jiaying Liu, Wen-Huang Cheng
{"title":"Session details: Panel-2","authors":"Jiaying Liu, Wen-Huang Cheng","doi":"10.1145/3286937","DOIUrl":"https://doi.org/10.1145/3286937","url":null,"abstract":"","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114706699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FlexStream
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240676
Ibrahim Ben Mustafa, T. Nadeem, Emir Halepovic
{"title":"FlexStream","authors":"Ibrahim Ben Mustafa, T. Nadeem, Emir Halepovic","doi":"10.1145/3240508.3240676","DOIUrl":"https://doi.org/10.1145/3240508.3240676","url":null,"abstract":"We present FlexStream, a programmable framework realized by implementing Software-Defined Networking (SDN) functionality on end devices. FlexStream exploits the benefits of both centralized and distributed components to achieve dynamic management of end devices, as required and in accordance with specified policies. We evaluate FlexStream on one example use case -- the adaptive video streaming, where bandwidth control is employed to drive selection of video bitrates, improve stability and increase robustness against background traffic. When applied to competing streaming clients, FlexStream reduces bitrate switching by 81%, stall duration by 92%, and startup delay by 44%, while improving fairness among players. In addition, we report the first implementation of SDN-based control in Android devices running in real Wi-Fi and live cellular networks.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115432625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deep Learning Interpretation 深度学习解释
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3241472
J. Sang
{"title":"Deep Learning Interpretation","authors":"J. Sang","doi":"10.1145/3240508.3241472","DOIUrl":"https://doi.org/10.1145/3240508.3241472","url":null,"abstract":"Deep learning has been successfully exploited in addressing different multimedia problems in recent years. The academic researchers are now transferring their attention from identifying what problem deep learning CAN address to exploring what problem deep learning CAN NOT address. This tutorial starts with a summarization of six 'CAN NOT' problems deep learning fails to solve at the current stage, i.e., low stability, debugging difficulty, poor parameter transparency, poor incrementality, poor reasoning ability, and machine bias. These problems share a common origin from the lack of deep learning interpretation. This tutorial attempts to correspond the six 'NOT' problems to three levels of deep learning interpretation: (1) Locating - accurately and efficiently locating which feature contributes much to the output. (2) Understanding - bidirectional semantic accessing between human knowledge and deep learning algorithm. (3) Expandability - well storing, accumulating and reusing the models learned from deep learning. Existing studies falling into these three levels will be reviewed in detail, and a discussion on the future interesting directions will be provided in the end.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123166567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Temporal Hierarchical Attention at Category- and Item-Level for Micro-Video Click-Through Prediction 微视频点击预测的类别和项目水平的时间层次注意
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240617
Xusong Chen, Dong Liu, Zhengjun Zha, Wen-gang Zhou, Zhiwei Xiong, Yan Li
{"title":"Temporal Hierarchical Attention at Category- and Item-Level for Micro-Video Click-Through Prediction","authors":"Xusong Chen, Dong Liu, Zhengjun Zha, Wen-gang Zhou, Zhiwei Xiong, Yan Li","doi":"10.1145/3240508.3240617","DOIUrl":"https://doi.org/10.1145/3240508.3240617","url":null,"abstract":"Micro-video sharing gains great popularity in recent years, which calls for effective recommendation algorithm to help user find their interested micro-videos. Compared with traditional online (e.g. YouTube) videos, micro-videos contributed by grass-root users and taken by smartphones are much shorter (tens of seconds) and more short of tags or descriptive text, making the recommendation of micro-videos a challenging task. In this paper, we investigate how to model user's historical behaviors so as to predict the user's click-through of micro-videos. Inspired by the recent deep network-based methods, we propose a Temporal Hierarchical Attention at Category- and Item-Level (THACIL) network for user behavior modeling. First, we use temporal windows to capture the short-term dynamics of user interests; Second, we leverage a category-level attention mechanism to characterize user's diverse interests, as well as an item-level attention mechanism for fine-grained profiling of user interests; Third, we adopt forward multi-head self-attention to capture the long-term correlation within user behaviors. Our proposed THACIL network was tested on MicroVideo-1.7M, a new dataset of 1.7 million micro-videos, coming from real data of a micro-video sharing service in China. Experimental results demonstrate the effectiveness of the proposed method in comparison with the state-of-the-art solutions.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124902620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信