Proceedings of the 2020 International Conference on Multimedia Retrieval最新文献_第2页

At the Speed of Sound: Efficient Audio Scene Classification 以声音的速度:有效的音频场景分类

Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390730

B. Dong, C. Lumezanu, Yuncong Chen, Dongjin Song, Takehiko Mizoguchi, Haifeng Chen, L. Khan

引用次数: 8

Automation of Deep Learning - Theory and Practice 深度学习自动化-理论与实践

Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390739

Martin Wistuba, Ambrish Rawat, Tejaswini Pedapati

{"title":"Automation of Deep Learning - Theory and Practice","authors":"Martin Wistuba, Ambrish Rawat, Tejaswini Pedapati","doi":"10.1145/3372278.3390739","DOIUrl":"https://doi.org/10.1145/3372278.3390739","url":null,"abstract":"The growing interest in both the automation of machine learning and deep learning has inevitably led to the development of a wide variety of methods to automate deep learning. The choice of network architecture has proven critical, and many improvements in deep learning are due to new structuring of it. However, deep learning techniques are computationally intensive and their use requires a high level of domain knowledge. Even a partial automation of this process therefore helps to make deep learning more accessible for everyone. In this tutorial we present a uniform formalism that enables different methods to be categorized and compare the different approaches in terms of their performance. We achieve this through a comprehensive discussion of the commonly used architecture search spaces and architecture optimization algorithms based on reinforcement learning and evolutionary algorithms as well as approaches that include surrogate and one-shot models. In addition, we discuss approaches to accelerate the search for neural architectures based on early termination and transfer learning and address the new research directions, which include constrained and multi-objective architecture search as well as the automated search for data augmentation, optimizers, and activation functions.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128134438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

System Fusion with Deep Ensembles 系统融合与深度集成

Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390720

Liviu-Daniel Stefan, M. Constantin, B. Ionescu

引用次数: 4

Deep Semantic-Alignment Hashing for Unsupervised Cross-Modal Retrieval 无监督跨模态检索的深度语义对齐哈希

Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390673

Dejie Yang, Dayan Wu, Wanqian Zhang, Haisu Zhang, Bo Li, Weiping Wang

{"title":"Deep Semantic-Alignment Hashing for Unsupervised Cross-Modal Retrieval","authors":"Dejie Yang, Dayan Wu, Wanqian Zhang, Haisu Zhang, Bo Li, Weiping Wang","doi":"10.1145/3372278.3390673","DOIUrl":"https://doi.org/10.1145/3372278.3390673","url":null,"abstract":"Deep hashing methods have achieved tremendous success in cross-modal retrieval, due to its low storage consumption and fast retrieval speed. In real cross-modal retrieval applications, it's hard to obtain label information. Recently, increasing attention has been paid to unsupervised cross-modal hashing. However, existing methods fail to exploit the intrinsic connections between images and their corresponding descriptions or tags (text modality). In this paper, we propose a novel Deep Semantic-Alignment Hashing (DSAH) for unsupervised cross-modal retrieval, which sufficiently utilizes the co-occurred image-text pairs. DSAH explores the similarity information of different modalities and we elaborately design a semantic-alignment loss function, which elegantly aligns the similarities between features with those between hash codes. Moreover, to further bridge the modality gap, we innovatively propose to reconstruct features of one modality with hash codes of the other one. Extensive experiments on three cross-modal retrieval datasets demonstrate that DSAH achieves the state-of-the-art performance.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114776444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 37

Sentence-based and Noise-robust Cross-modal Retrieval on Cooking Recipes and Food Images 基于句子和噪声鲁棒的烹饪食谱和食物图像的跨模态检索

Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390681

Zichen Zan, Lin Li, Jianquan Liu, D. Zhou

{"title":"Sentence-based and Noise-robust Cross-modal Retrieval on Cooking Recipes and Food Images","authors":"Zichen Zan, Lin Li, Jianquan Liu, D. Zhou","doi":"10.1145/3372278.3390681","DOIUrl":"https://doi.org/10.1145/3372278.3390681","url":null,"abstract":"In recent years, people are facing with billions of food images, videos and recipes on social medias. An appropriate technology is highly desired to retrieve accurate contents across food images and cooking recipes, like cross-modal retrieval framework. Based on our observations, the order of sequential sentences in recipes and the noises in food images will affect retrieval results. We take into account the sentence-level sequential orders of instructions and ingredients in recipes, and noise portion in food images to propose a new framework for cross-retrieval. In our framework, we propose three new strategies to improve the retrieval accuracy. (1) We encode recipe titles, ingredients, instructions in sentence level, and adopt three attention networks on multi-layer hidden state features separately to capture more semantic information. (2) We apply attention mechanism to select effective features from food images incorporating with recipe embeddings, and adopt an adversarial learning strategy to enhance modality alignment. (3) We design a new triplet loss scheme with an effective sampling strategy to reduce the noise impact on retrieval results. The experimental results show that our framework clearly outperforms the state-of-art methods in terms of median rank and recall rate at top k on the Recipe 1M dataset.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125882036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Anomaly Detection in Traffic Surveillance Videos with GAN-based Future Frame Prediction 基于gan未来帧预测的交通监控视频异常检测

Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390701

Khac-Tuan Nguyen, Dat-Thanh Dinh, M. Do, M. Tran

引用次数: 28

One Shot Logo Recognition Based on Siamese Neural Networks 基于连体神经网络的一次性标识识别

Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390734

Camilo Vargas, Qianni Zhang, E. Izquierdo

引用次数: 11

Detection of Semantic Risk Situations in Lifelog Data for Improving Life of Frail People 生命日志数据语义风险情境的检测与改善体弱者生活

Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3391931

Thinhinane Yebda, J. Benois-Pineau, M. Pech, H. Amièva, C. Gurrin

引用次数: 3

MMArt-ACM'20: International Joint Workshop on Multimedia Artworks Analysis and Attractiveness Computing in Multimedia 2020 MMArt-ACM'20:多媒体艺术作品分析与吸引力计算国际联合研讨会2020

Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3388042

W. Chu, I. Ide, Naoko Nitta, N. Tsumura, T. Yamasaki

引用次数: 2

A Lightweight Gated Global Module for Global Context Modeling in Neural Networks 面向神经网络全局上下文建模的轻量级门控全局模块

Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390712

Li Hao, Liping Hou, Yuantao Song, K. Lu, Jian Xue

{"title":"A Lightweight Gated Global Module for Global Context Modeling in Neural Networks","authors":"Li Hao, Liping Hou, Yuantao Song, K. Lu, Jian Xue","doi":"10.1145/3372278.3390712","DOIUrl":"https://doi.org/10.1145/3372278.3390712","url":null,"abstract":"Global context modeling has been used to achieve better performance in various computer-vision-related tasks, such as classification, detection, segmentation and multimedia retrieval applications. However, most of the existing global mechanisms display problems regarding convergence during training. In this paper, we propose a novel gated global module (GGM) that is lightweight and yet effective in terms of achieving better integration of global information in relation to feature representation. Regarding the original structure of the network as a local block, our module infers global information in parallel with local information, and then a gate function is applied to generate global guidance which is applied to the output of the local module to capture representative information. The proposed GGM can be easily integrated with common CNN architectures and is training friendly. We used a classification task as an example to verify the effectiveness of the proposed GGM, and extensive experiments on ImageNet and CIFAR demonstrated that our method can be widely applied and is conducive to integrating global information into common networks.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"117 16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126403908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1