Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval最新文献

筛选
英文 中文
Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval 基于多模态线索的跨模态视频文本检索联合嵌入学习
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206064
Niluthpol Chowdhury Mithun, Juncheng Billy Li, Florian Metze, A. Roy-Chowdhury
{"title":"Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval","authors":"Niluthpol Chowdhury Mithun, Juncheng Billy Li, Florian Metze, A. Roy-Chowdhury","doi":"10.1145/3206025.3206064","DOIUrl":"https://doi.org/10.1145/3206025.3206064","url":null,"abstract":"Constructing a joint representation invariant across different modalities (e.g., video, language) is of significant importance in many multimedia applications. While there are a number of recent successes in developing effective image-text retrieval methods by learning joint representations, the video-text retrieval task, however, has not been explored to its fullest extent. In this paper, we study how to effectively utilize available multimodal cues from videos for the cross-modal video-text retrieval task. Based on our analysis, we propose a novel framework that simultaneously utilizes multi-modal features (different visual characteristics, audio inputs, and text) by a fusion strategy for efficient retrieval. Furthermore, we explore several loss functions in training the embedding and propose a modified pairwise ranking loss for the task. Experiments on MSVD and MSR-VTT datasets demonstrate that our method achieves significant performance gain compared to the state-of-the-art approaches.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130865429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 220
Supervised Nonparametric Multimodal Topic Modeling Methods for Multi-class Video Classification 多类视频分类的监督非参数多模态主题建模方法
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206036
Jianfei Xue, K. Eguchi
{"title":"Supervised Nonparametric Multimodal Topic Modeling Methods for Multi-class Video Classification","authors":"Jianfei Xue, K. Eguchi","doi":"10.1145/3206025.3206036","DOIUrl":"https://doi.org/10.1145/3206025.3206036","url":null,"abstract":"Nonparametric topic models such as hierarchical Dirichlet processes (HDP) have been attracting more and more attentions for multimedia data analysis. However, the existing models for multimedia data are unsupervised ones that purely cluster semantically or characteristically related features into a specific latent topic without considering side information such as class information. In this paper, we present a novel supervised sequential symmetric correspondence HDP (Sup-SSC-HDP) model for multi-class video classification, where the empirical topic frequencies learned from multimodal video data are modeled as a predictor of video class. Qualitative and quantitative assessments demonstrate the effectiveness of Sup-SSC-HDP.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131849634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Steganographer Detection based on Multiclass Dilated Residual Networks 基于多类扩展残差网络的隐写检测
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206031
Mingjie Zheng, S. Zhong, Songtao Wu, Jianmin Jiang
{"title":"Steganographer Detection based on Multiclass Dilated Residual Networks","authors":"Mingjie Zheng, S. Zhong, Songtao Wu, Jianmin Jiang","doi":"10.1145/3206025.3206031","DOIUrl":"https://doi.org/10.1145/3206025.3206031","url":null,"abstract":"Steganographer detection task is to identify criminal users, who attempt to conceal confidential information by steganography methods, among a large number of innocent users. The significant challenge of the task is how to collect the evidences to identify the guilty user with suspicious images, which are embedded with secret messages generating by unknown steganography and payload. Unfortunately, existing methods for steganalysis were served for the binary classification. It makes them harder to classify the images with different kinds of payloads, especially when the payloads of images in test dataset have not been provided in advance. In this paper, we propose a novel steganographer detection method based on multiclass deep neural networks. In the training stage, the networks are trained to classify the images with six types of payloads. The networks can preserve even strengthen the weak stego signals from secret messages in much larger receptive filed by virtue of residual and dilated residual learning. In the inference stage, the learnt model is used to extract the discriminative features, which can capture the difference between guilty users and innocent users. A series of empirical experimental results demonstrate that the proposed method achieves good performance in spatial and frequency domains even though the embedding payload is low. The proposed method achieves a higher level of robustness of inter-steganographic algorithms and can provide a possible solution to address the payload mismatch problem","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132527010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Multimodal Network Embedding via Attention based Multi-view Variational Autoencoder 基于注意的多视图变分自编码器的多模态网络嵌入
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206035
Feiran Huang, Xiaoming Zhang, Chaozhuo Li, Zhoujun Li, Yueying He, Zhonghua Zhao
{"title":"Multimodal Network Embedding via Attention based Multi-view Variational Autoencoder","authors":"Feiran Huang, Xiaoming Zhang, Chaozhuo Li, Zhoujun Li, Yueying He, Zhonghua Zhao","doi":"10.1145/3206025.3206035","DOIUrl":"https://doi.org/10.1145/3206025.3206035","url":null,"abstract":"Learning the embedding for social media data has attracted extensive research interests as well as boomed a lot of applications, such as classification and link prediction. In this paper, we examine the scenario of a multimodal network with nodes containing multimodal contents and connected by heterogeneous relationships, such as social images containing multimodal contents (e.g., visual content and text description), and linked with various forms (e.g., in the same album or with the same tag). However, given the multimodal network, simply learning the embedding from the network structure or a subset of content results in sub-optimal representation. In this paper, we propose a novel deep embedding method, i.e., Attention-based Multi-view Variational Auto-Encoder (AMVAE), to incorporate both the link information and the multimodal contents for more effective and efficient embedding. Specifically, we adopt LSTM with attention model to learn the correlation between different data modalities, such as the correlation between visual regions and the specific words, to obtain the semantic embedding of the multimodal contents. Then, the link information and the semantic embedding are considered as two correlated views. A multi-view correlation learning based Variational Auto-Encoder (VAE) is proposed to learn the representation of each node, in which the embedding of link information and multimodal contents are integrated and mutually reinforced. Experiments on three real-world datasets demonstrate the superiority of the proposed model in two applications, i.e., multi-label classification and link prediction.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"191 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133847630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Collaborative Subspace Graph Hashing for Cross-modal Retrieval 跨模态检索的协同子空间图哈希
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206042
Xiang Zhang, Guohua Dong, Yimo Du, Chengkun Wu, Zhigang Luo, Canqun Yang
{"title":"Collaborative Subspace Graph Hashing for Cross-modal Retrieval","authors":"Xiang Zhang, Guohua Dong, Yimo Du, Chengkun Wu, Zhigang Luo, Canqun Yang","doi":"10.1145/3206025.3206042","DOIUrl":"https://doi.org/10.1145/3206025.3206042","url":null,"abstract":"Current hashing methods for cross-modal retrieval generally attempt to learn the separate modality-specific transformation matrices to embed multi-modality data into a latent common subspace, and usually ignore the fact that respecting the diversity of multi-modality features in the latent subspace could be beneficial for retrieval improvements. To this, we propose a collaborative subspace graph hashing method (CSGH) to perform a two-stage collaborative learning framework for cross-modal retrieval. Particularly, CSGH first embeds multi-modality data into separate latent subspaces through individual modality-specific transformation matrices, and then connects these latent subspaces to a common Hamming space through a shared transformation matrix. In this framework, CSGH considers the modality-specific neighborhood structure and the cross-modal correlation within multi-modality data through the Laplacian regularization and the graph based correlation constraint, respectively. To solve CSGH, we develop an alternative procedure to optimize it, and fortunately, each sub-problem of CSGH has the elegant analytical solution. Experiments of cross-modal retrieval on Wiki, NUS-WIDE, Flickr25K and Flickr1M datasets show the effectiveness of CSGH compared with the state-of-the-art cross-modal hashing methods.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131601957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Multi-Scale Spatiotemporal Conv-LSTM Network for Video Saliency Detection 基于多尺度时空卷积lstm网络的视频显著性检测
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206052
Yi Tang, Wenbin Zou, Zhi Jin, Xia Li
{"title":"Multi-Scale Spatiotemporal Conv-LSTM Network for Video Saliency Detection","authors":"Yi Tang, Wenbin Zou, Zhi Jin, Xia Li","doi":"10.1145/3206025.3206052","DOIUrl":"https://doi.org/10.1145/3206025.3206052","url":null,"abstract":"Recently, deep neural networks have been crucial techniques for image salient detection. However, two difficulties prevent the development of deep learning in video saliency detection. The first one is that the traditional static network cannot conduct a robust motion estimation in videos. The other is that the data-driven deep learning is in lack of sufficient manually annotated pixel-wise ground truths for video saliency network training. In this paper, we propose a multi-scale spatiotemporal convolutional LSTM network (MSST-ConvLSTM) to incorporate spatial and temporal cues for video salient objects detection. Furthermore, as manually pixel-wised labeling is very time-consuming, we sign lots of coarse labels, which are mixed with fine labels to train a robust saliency prediction model. Experiments on the widely used challenging benchmark datasets (e.g., FBMS and DAVIS) demonstrate that the proposed approach has competitive performance of video saliency detection compared with the state-of-the-art saliency models.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123332167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Prototyping for Envisioning the Future 设想未来的原型
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3210490
Yamanaka Shunji
{"title":"Prototyping for Envisioning the Future","authors":"Yamanaka Shunji","doi":"10.1145/3206025.3210490","DOIUrl":"https://doi.org/10.1145/3206025.3210490","url":null,"abstract":"As an industrial designer I have worked in collaboration with various researchers and scientists since the beginning of this century. I have made many prototypes showing the possibility of their leading edge technologies, and exhibited them in these years. As the archives of academic documents and papers have became open, and the internet gave public access to the recordings of various experiments being conducted throughout the world, technology in laboratories are now constantly exposed to the public. In this context, prototypes are becoming more important as the medium that bridges between advanced technology and society. Now a prototype is not merely an experimental machine. It is a device created to present user experience in advance, to share the benefits of the technology with many others. The role of a prototype is not limited to just sharing of values within the development team, but goes beyond that: it is a medium used to voice the significance of research and development to society; an inspiration to stimulate future markets; and also a tool to secure development budgets. A prototype is the physical embodiment of speculative story that connects people to technology that has yet to be brought to society. I would like to introduce some of the prototypes we developed and share the future vision they invoke.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123835529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Challenges and Opportunities within Personal Life Archives 个人生活档案中的挑战与机遇
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206040
Duc-Tien Dang-Nguyen, M. Riegler, Liting Zhou, C. Gurrin
{"title":"Challenges and Opportunities within Personal Life Archives","authors":"Duc-Tien Dang-Nguyen, M. Riegler, Liting Zhou, C. Gurrin","doi":"10.1145/3206025.3206040","DOIUrl":"https://doi.org/10.1145/3206025.3206040","url":null,"abstract":"Nowadays, almost everyone holds some form or other of a personal life archive. Automatically maintaining such an archive is an activity that is becoming increasingly common, however without automatic support the users will quickly be overwhelmed by the volume of data and will miss out on the potential benefits that lifelogs provide. In this paper we give an overview of the current status of lifelog research and propose a concept for exploring these archives. We motivate the need for new methodologies for indexing data, organizing content and supporting information access. Finally we will describe challenges to be addressed and give an overview of initial steps that have to be taken, to address the challenges of organising and searching personal life archives.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126345464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Towards Better Understanding of Player's Game Experience 更好地理解玩家的游戏体验
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206072
Wenlu Yang, M. Rifqi, C. Marsala, Andrea Pinna
{"title":"Towards Better Understanding of Player's Game Experience","authors":"Wenlu Yang, M. Rifqi, C. Marsala, Andrea Pinna","doi":"10.1145/3206025.3206072","DOIUrl":"https://doi.org/10.1145/3206025.3206072","url":null,"abstract":"Improving player's game experience has always been the common goal of video game practitioner. In order to get a better understanding of player's perception of game experience, we carry out experimental study for data collection and present game experience prediction model based on machine learning method. The model is trained on the proposed multi-modal database which contains: physiological modality, behavioral modality and meta-information to predict the player game experience in terms of difficulty, immersion and amusement. By investigating the model trained on separate and fusion feature sets, we show that physiological modality is effective. Moreover, better understanding is achieved with further analysis on the most relevant features in the behavioral and meta-information features set. We argue that combining the physiological modalities with behavioral and meta information can provide a better performance on the game experience prediction.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114680042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
MOOCex
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206087
Matthew Cooper, Jian Zhao, C. Bhatt, David A. Shamma
{"title":"MOOCex","authors":"Matthew Cooper, Jian Zhao, C. Bhatt, David A. Shamma","doi":"10.1145/3206025.3206087","DOIUrl":"https://doi.org/10.1145/3206025.3206087","url":null,"abstract":"Massive Open Online Course (MOOC) platforms have scaled online education to unprecedented enrollments, but remain limited by their predetermined curricula. Increasingly, professionals consume this content to augment or update specific skills rather than complete degree or certification programs. To better address the needs of this emergent user population, we describe a visual recommender system called MOOCex. The system recommends lecture videos across multiple courses and content platforms to provide a choice of perspectives on topics of interest. The recommendation engine considers both video content and sequential inter-topic relationships mined from course syllabi. Furthermore, it allows for interactive visual exploration of the semantic space of recommendations within a learner's current context.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114847216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信