Proceedings of the 2020 International Conference on Multimedia Retrieval最新文献

筛选
英文 中文
At the Speed of Sound: Efficient Audio Scene Classification 以声音的速度:有效的音频场景分类
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390730
B. Dong, C. Lumezanu, Yuncong Chen, Dongjin Song, Takehiko Mizoguchi, Haifeng Chen, L. Khan
{"title":"At the Speed of Sound: Efficient Audio Scene Classification","authors":"B. Dong, C. Lumezanu, Yuncong Chen, Dongjin Song, Takehiko Mizoguchi, Haifeng Chen, L. Khan","doi":"10.1145/3372278.3390730","DOIUrl":"https://doi.org/10.1145/3372278.3390730","url":null,"abstract":"Efficient audio scene classification is essential for smart sensing platforms such as robots, medical monitoring, surveillance, or autonomous vehicles. We propose a retrieval-based scene classification architecture that combines recurrent neural networks and attention to compute embeddings for short audio segments. We train our framework using a custom audio loss function that captures both the relevance of audio segments within a scene and that of sound events within a segment. Using experiments on real audio scenes, we show that we can discriminate audio scenes with high accuracy after listening in for less than a second. This preserves 93% of the detection accuracy obtained after hearing the entire scene.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122197209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Automation of Deep Learning - Theory and Practice 深度学习自动化-理论与实践
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390739
Martin Wistuba, Ambrish Rawat, Tejaswini Pedapati
{"title":"Automation of Deep Learning - Theory and Practice","authors":"Martin Wistuba, Ambrish Rawat, Tejaswini Pedapati","doi":"10.1145/3372278.3390739","DOIUrl":"https://doi.org/10.1145/3372278.3390739","url":null,"abstract":"The growing interest in both the automation of machine learning and deep learning has inevitably led to the development of a wide variety of methods to automate deep learning. The choice of network architecture has proven critical, and many improvements in deep learning are due to new structuring of it. However, deep learning techniques are computationally intensive and their use requires a high level of domain knowledge. Even a partial automation of this process therefore helps to make deep learning more accessible for everyone. In this tutorial we present a uniform formalism that enables different methods to be categorized and compare the different approaches in terms of their performance. We achieve this through a comprehensive discussion of the commonly used architecture search spaces and architecture optimization algorithms based on reinforcement learning and evolutionary algorithms as well as approaches that include surrogate and one-shot models. In addition, we discuss approaches to accelerate the search for neural architectures based on early termination and transfer learning and address the new research directions, which include constrained and multi-objective architecture search as well as the automated search for data augmentation, optimizers, and activation functions.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128134438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
System Fusion with Deep Ensembles 系统融合与深度集成
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390720
Liviu-Daniel Stefan, M. Constantin, B. Ionescu
{"title":"System Fusion with Deep Ensembles","authors":"Liviu-Daniel Stefan, M. Constantin, B. Ionescu","doi":"10.1145/3372278.3390720","DOIUrl":"https://doi.org/10.1145/3372278.3390720","url":null,"abstract":"Deep neural networks (DNNs) are universal estimators that have achieved state-of-the-art performance in a broad spectrum of classification tasks, opening new perspectives for many applications. One of them is addressing ensemble learning. In this paper, we introduce a set of deep learning techniques for ensemble learning with dense, attention, and convolutional neural network layers. Our approach automatically discovers patterns and correlations between the decisions of individual classifiers, therefore, alleviating the difficulty of building such architectures. To assess its robustness, we evaluate our approach on two complex data sets that target different perspectives of predicting the user perception of multimedia data, i.e., interestingness and violence. The proposed approach outperforms the existing state-of-the-art algorithms by a large margin.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131727789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Deep Semantic-Alignment Hashing for Unsupervised Cross-Modal Retrieval 无监督跨模态检索的深度语义对齐哈希
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390673
Dejie Yang, Dayan Wu, Wanqian Zhang, Haisu Zhang, Bo Li, Weiping Wang
{"title":"Deep Semantic-Alignment Hashing for Unsupervised Cross-Modal Retrieval","authors":"Dejie Yang, Dayan Wu, Wanqian Zhang, Haisu Zhang, Bo Li, Weiping Wang","doi":"10.1145/3372278.3390673","DOIUrl":"https://doi.org/10.1145/3372278.3390673","url":null,"abstract":"Deep hashing methods have achieved tremendous success in cross-modal retrieval, due to its low storage consumption and fast retrieval speed. In real cross-modal retrieval applications, it's hard to obtain label information. Recently, increasing attention has been paid to unsupervised cross-modal hashing. However, existing methods fail to exploit the intrinsic connections between images and their corresponding descriptions or tags (text modality). In this paper, we propose a novel Deep Semantic-Alignment Hashing (DSAH) for unsupervised cross-modal retrieval, which sufficiently utilizes the co-occurred image-text pairs. DSAH explores the similarity information of different modalities and we elaborately design a semantic-alignment loss function, which elegantly aligns the similarities between features with those between hash codes. Moreover, to further bridge the modality gap, we innovatively propose to reconstruct features of one modality with hash codes of the other one. Extensive experiments on three cross-modal retrieval datasets demonstrate that DSAH achieves the state-of-the-art performance.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114776444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Sentence-based and Noise-robust Cross-modal Retrieval on Cooking Recipes and Food Images 基于句子和噪声鲁棒的烹饪食谱和食物图像的跨模态检索
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390681
Zichen Zan, Lin Li, Jianquan Liu, D. Zhou
{"title":"Sentence-based and Noise-robust Cross-modal Retrieval on Cooking Recipes and Food Images","authors":"Zichen Zan, Lin Li, Jianquan Liu, D. Zhou","doi":"10.1145/3372278.3390681","DOIUrl":"https://doi.org/10.1145/3372278.3390681","url":null,"abstract":"In recent years, people are facing with billions of food images, videos and recipes on social medias. An appropriate technology is highly desired to retrieve accurate contents across food images and cooking recipes, like cross-modal retrieval framework. Based on our observations, the order of sequential sentences in recipes and the noises in food images will affect retrieval results. We take into account the sentence-level sequential orders of instructions and ingredients in recipes, and noise portion in food images to propose a new framework for cross-retrieval. In our framework, we propose three new strategies to improve the retrieval accuracy. (1) We encode recipe titles, ingredients, instructions in sentence level, and adopt three attention networks on multi-layer hidden state features separately to capture more semantic information. (2) We apply attention mechanism to select effective features from food images incorporating with recipe embeddings, and adopt an adversarial learning strategy to enhance modality alignment. (3) We design a new triplet loss scheme with an effective sampling strategy to reduce the noise impact on retrieval results. The experimental results show that our framework clearly outperforms the state-of-art methods in terms of median rank and recall rate at top k on the Recipe 1M dataset.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125882036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Anomaly Detection in Traffic Surveillance Videos with GAN-based Future Frame Prediction 基于gan未来帧预测的交通监控视频异常检测
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390701
Khac-Tuan Nguyen, Dat-Thanh Dinh, M. Do, M. Tran
{"title":"Anomaly Detection in Traffic Surveillance Videos with GAN-based Future Frame Prediction","authors":"Khac-Tuan Nguyen, Dat-Thanh Dinh, M. Do, M. Tran","doi":"10.1145/3372278.3390701","DOIUrl":"https://doi.org/10.1145/3372278.3390701","url":null,"abstract":"It is essential to develop efficient methods to detect abnormal events, such as car-crashes or stalled vehicles, from surveillance cameras to provide in-time help. This motivates us to propose a novel method to detect traffic accidents in traffic videos. To tackle the problem where anomalies only occupy a small amount of data, we propose a semi-supervised method using Generative Adversarial Network trained on regular sequences to predict future frames. Our key idea is to model the ordinary world with a generative model, then compare a predicted frame with the real next frame to determine if an abnormal event occurs. We also propose a new idea of encoding motion descriptors and scaled intensity loss function to optimize GAN for fast-moving objects. Experiments on the Traffic Anomaly Detection dataset of AI City Challenge 2019 show that our method achieves the top 3 results with F1 score 0.9412 and RMSE 4.8088, and S3 score 0.9261. Our method can be applied to different related applications of anomaly and outlier detection in videos.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129505980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
One Shot Logo Recognition Based on Siamese Neural Networks 基于连体神经网络的一次性标识识别
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390734
Camilo Vargas, Qianni Zhang, E. Izquierdo
{"title":"One Shot Logo Recognition Based on Siamese Neural Networks","authors":"Camilo Vargas, Qianni Zhang, E. Izquierdo","doi":"10.1145/3372278.3390734","DOIUrl":"https://doi.org/10.1145/3372278.3390734","url":null,"abstract":"This work presents an approach for one-shot logo recognition that relies on a Siamese neural network (SNN) embedded with a pre-trained model that is fine-tuned on a challenging logo dataset. Although the model is fine-tuned using logo images, the training and testing datasets do not have overlapped categories; meaning that, all the classes used for testing the one-shot recognition framework remain unseen during the fine-tuning process. The recognition process follows the standard SNN approach in which a pair of input images are encoded by each sister network. The encoded outputs for each image are afterwards compared using a trained metric and thresholded to define matches and mismatches. The proposed approach achieves an accuracy of 77.07% under the one-shot constraints in the QMUL-OpenLogo dataset. Code is available at https://github.com/cjvargasc/oneshot_siamese/.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131072978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Detection of Semantic Risk Situations in Lifelog Data for Improving Life of Frail People 生命日志数据语义风险情境的检测与改善体弱者生活
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3391931
Thinhinane Yebda, J. Benois-Pineau, M. Pech, H. Amièva, C. Gurrin
{"title":"Detection of Semantic Risk Situations in Lifelog Data for Improving Life of Frail People","authors":"Thinhinane Yebda, J. Benois-Pineau, M. Pech, H. Amièva, C. Gurrin","doi":"10.1145/3372278.3391931","DOIUrl":"https://doi.org/10.1145/3372278.3391931","url":null,"abstract":"The automatic recognition of risk situations for frail people is an urgent research topic for the interdisciplinary artificial intelligence and multimedia community. Risky situations can be recognized from lifelog data recorded with wearable devices. In this paper, we present a new approach for the detection of semantic risk situations for frail people in lifelog data. Concept matching between general lifelog and risk taxonomies was realized and tuned AlexNet was deployed for detection of two semantic risks situations such as risk of domestic accident and risk of fraud with promising results.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"187 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134451991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
MMArt-ACM'20: International Joint Workshop on Multimedia Artworks Analysis and Attractiveness Computing in Multimedia 2020 MMArt-ACM'20:多媒体艺术作品分析与吸引力计算国际联合研讨会2020
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3388042
W. Chu, I. Ide, Naoko Nitta, N. Tsumura, T. Yamasaki
{"title":"MMArt-ACM'20: International Joint Workshop on Multimedia Artworks Analysis and Attractiveness Computing in Multimedia 2020","authors":"W. Chu, I. Ide, Naoko Nitta, N. Tsumura, T. Yamasaki","doi":"10.1145/3372278.3388042","DOIUrl":"https://doi.org/10.1145/3372278.3388042","url":null,"abstract":"The International Joint Workshop on Multimedia Artworks Analysis and Attractiveness Computing in Multimedia (MMArt-ACM) solicits contributions on methodology advancement and novel applications of multimedia artworks and attractiveness computing that emerge in the era of big data and deep learning. Despite the strike of the Covid-19 pandemic, this workshop attracts submissions of diverse topics in these two fields, and the workshop program finally consists of five presented papers. The topics cover image retrieval, image transformation and generation, recommendation system, and image/video summarization. The actual MMArt-ACM'20 Proceedings are available in the ACM DL at: https://dl.acm.org/citation.cfm?id=3379173","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125532361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Lightweight Gated Global Module for Global Context Modeling in Neural Networks 面向神经网络全局上下文建模的轻量级门控全局模块
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390712
Li Hao, Liping Hou, Yuantao Song, K. Lu, Jian Xue
{"title":"A Lightweight Gated Global Module for Global Context Modeling in Neural Networks","authors":"Li Hao, Liping Hou, Yuantao Song, K. Lu, Jian Xue","doi":"10.1145/3372278.3390712","DOIUrl":"https://doi.org/10.1145/3372278.3390712","url":null,"abstract":"Global context modeling has been used to achieve better performance in various computer-vision-related tasks, such as classification, detection, segmentation and multimedia retrieval applications. However, most of the existing global mechanisms display problems regarding convergence during training. In this paper, we propose a novel gated global module (GGM) that is lightweight and yet effective in terms of achieving better integration of global information in relation to feature representation. Regarding the original structure of the network as a local block, our module infers global information in parallel with local information, and then a gate function is applied to generate global guidance which is applied to the output of the local module to capture representative information. The proposed GGM can be easily integrated with common CNN architectures and is training friendly. We used a classification task as an example to verify the effectiveness of the proposed GGM, and extensive experiments on ImageNet and CIFAR demonstrated that our method can be widely applied and is conducive to integrating global information into common networks.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"117 16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126403908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信