Proceedings of the 2020 International Conference on Multimedia Retrieval最新文献

筛选
英文 中文
Semantic Gated Network for Efficient News Representation 高效新闻表示的语义门控网络
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390719
Xuxiao Bu, Bingfeng Li, Yaxiong Wang, Jihua Zhu, Xueming Qian, Marco Zhao
{"title":"Semantic Gated Network for Efficient News Representation","authors":"Xuxiao Bu, Bingfeng Li, Yaxiong Wang, Jihua Zhu, Xueming Qian, Marco Zhao","doi":"10.1145/3372278.3390719","DOIUrl":"https://doi.org/10.1145/3372278.3390719","url":null,"abstract":"Learning an efficient news representation is a fundamental yet important problem for many tasks. Most existing news-relevant methods only take the textual information while abandoning the visual clues from the illustrations. We argue that the textual title and tags together with the visual illustrations form the main force of a piece of news and are more efficient to express the news content. In this paper, we develop a novel framework, namely Semantic Gated Network (SGN), to integrate the news title, tags and visual illustrations to obtain an efficient joint textual-visual feature for the news, by which we can directly measure the relevance between two pieces of news. Particularly, we first harvest the tag embeddings by the proposed self-supervised classification model. Besides, news title is fed into a sentence encoder pretrained by two semantically relevant news to learn efficient contextualized word vectors. Then the feature of the news title is extracted based on the learned vectors and we combine it with features of tags to obtain textual feature. Finally, we design a novel mechanism named semantic gate to adaptively fuse the textual feature and the image feature. Extensive experiments on benchmark dataset demonstrate the effectiveness of our approach.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"240 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134145963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Music Tower Blocks: Multi-Faceted Exploration Interface for Web-Scale Music Access 音乐塔块:面向网络规模音乐访问的多面探索界面
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3391928
M. Schedl, Michael Mayr, Peter Knees
{"title":"Music Tower Blocks: Multi-Faceted Exploration Interface for Web-Scale Music Access","authors":"M. Schedl, Michael Mayr, Peter Knees","doi":"10.1145/3372278.3391928","DOIUrl":"https://doi.org/10.1145/3372278.3391928","url":null,"abstract":"We present Music Tower Blocks, a novel browsing interface for interactive music visualization, capable of dealing with web-scale music collections nowadays offered by major music streaming services. Based on a clustering created from fused metadata and acoustic features, a block-based skyline landscape is constructed. It can be navigated by the user in several ways (zooming, panning, changing angle of slope). User-adjustable color coding is used for highlighting various facets, e.g., visualizing the distributions of genres and acoustic features. Furthermore, several search and filtering capabilities are provided (e.g., search for artists and tracks; filtering with respect to track popularity to focus on top hits or discovering unknown gems). In addition, Music Tower Blocks offers the user to connect to their personal music streaming profiles and highlight on the landscape their favorite or recently-listened-to music, to support exploring parts of the landscape near to (or far-away from) their own taste.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114397885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An Active Learning Framework for Duplicate Detection in SaaS Platforms SaaS平台中重复检测的主动学习框架
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3391933
Quy H. Nguyen, Dac H. Nguyen, Minh-Son Dao, Duc-Tien Dang-Nguyen, C. Gurrin, Binh T. Nguyen
{"title":"An Active Learning Framework for Duplicate Detection in SaaS Platforms","authors":"Quy H. Nguyen, Dac H. Nguyen, Minh-Son Dao, Duc-Tien Dang-Nguyen, C. Gurrin, Binh T. Nguyen","doi":"10.1145/3372278.3391933","DOIUrl":"https://doi.org/10.1145/3372278.3391933","url":null,"abstract":"With the rapid growth of users' data in SaaS (Software-as-a-service) platforms using micro-services, it becomes essential to detect duplicated entities for ensuring the integrity and consistency of data in many companies and businesses (primarily multinational corporations). Due to the large volume of databases today, the expected duplicate detection algorithms need to be not only accurate but also practical, which means that it can release the detection results as fast as possible for a given request. Among existing algorithms for the deduplicate detection problem, using Siamese neural networks with the triplet loss has become one of the robust ways to measure the similarity of two entities (texts, paragraphs, or documents) for identifying all possible duplicated items. In this paper, we first propose a practical framework for building a duplicate detection system in a SaaS platform. Second, we present a new active learning schema for training and updating duplicate detection algorithms. In this schema, we not only allow the crowd to provide more annotated data for enhancing the chosen learning model but also use the Siamese neural networks as well as the triplet loss to construct an efficient model for the problem. Finally, we design a user interface of our proposed deduplicate detection system, which can easily apply for empirical applications in different companies.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132849299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Graph Group Collaborative Filtering 多图组协同过滤
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390715
Bo Jiang
{"title":"Multi-Graph Group Collaborative Filtering","authors":"Bo Jiang","doi":"10.1145/3372278.3390715","DOIUrl":"https://doi.org/10.1145/3372278.3390715","url":null,"abstract":"The task of recommending an item or an event to a user group attracts wide attention. Most existing works obtain group preference by aggregating personalized preferences in the same group. However, the groups, users, and items are connected in a more complex structure, e.g.the users in the same group may have different preferences. Thus, it is important to introduce correlations among groups, users, and items into embedding learning. To address this problem, we propose Multi-Graph Group Collaborative Filtering (MGGCF), which refines the group, user and item representations according to three bipartite graphs. Moreover, since MGGCF refines the group, user and item embeddings simultaneously, it would benefit both the group recommendation tasks and the individual recommendation tasks. Extensive experiments are conducted on one real-world dataset and two synthetic datasets. Empirical results demonstrate that MGGCF significantly improves not only the group recommendation but also the item recommendation. Further analysis verifies the importance of embedding propagation for learning better user, group, item representations, which reveals the rationality and effectiveness of MGGCF.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"209 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134536729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Attention Mechanisms, Signal Encodings and Fusion Strategies for Improved Ad-hoc Video Search with Dual Encoding Networks 双编码网络改进自组织视频搜索的注意机制、信号编码和融合策略
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390737
Damianos Galanopoulos, V. Mezaris
{"title":"Attention Mechanisms, Signal Encodings and Fusion Strategies for Improved Ad-hoc Video Search with Dual Encoding Networks","authors":"Damianos Galanopoulos, V. Mezaris","doi":"10.1145/3372278.3390737","DOIUrl":"https://doi.org/10.1145/3372278.3390737","url":null,"abstract":"In this paper, the problem of unlabeled video retrieval using textual queries is addressed. We present an extended dual encoding network which makes use of more than one encodings of the visual and textual content, as well as two different attention mechanisms. The latter serve the purpose of highlighting temporal locations in every modality that can contribute more to effective retrieval. The different encodings of the visual and textual inputs, along with early/late fusion strategies, are examined for further improving performance. Experimental evaluations and comparisons with state-of-the-art methods document the merit of the proposed network.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130994562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Google Helps YouTube: Learning Few-Shot Video Classification from Historic Tasks and Cross-Domain Sample Transfer 谷歌帮助YouTube:从历史任务和跨域样本传输中学习少量视频分类
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390687
Xinzhe Zhou, Yadong Mu
{"title":"Google Helps YouTube: Learning Few-Shot Video Classification from Historic Tasks and Cross-Domain Sample Transfer","authors":"Xinzhe Zhou, Yadong Mu","doi":"10.1145/3372278.3390687","DOIUrl":"https://doi.org/10.1145/3372278.3390687","url":null,"abstract":"The fact that video annotation is labor-intensive inspires recent research to endeavor on few-shot video classification. The core motivation of our work is to mitigate the supervision scarcity issue in this few-shot setting via cross-domain meta-learning. Particularly, we aim to harness large-scale richly-annotated image data (i.e., source domain) for few-shot video classification (i.e., target domain). The source data is heterogeneous (image v.s. video) and has noisy labels, not directly usable in the target domain. This work proposes meta-learning input-transformer (MLIT), a novel deep network that tames the noisy source data such that they are more amenable for being used in the target domain. It has two key traits. First, to bridge the data distribution gap between source / target domains, MLIT includes learnable neural layers to reweigh and transform the source data, effectively suppressing corrupted or noisy source data. Secondly, MLIT is designed to learn from historic video classification tasks in the target domain, which significantly elevates the accuracy of the unseen video category. Comprehensive empirical evaluations on two large-scale video datasets, ActivityNet and Kinetics-400, have strongly shown the superiority of our proposed method.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"392 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121252114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Itinerary Planning via Deep Reinforcement Learning 基于深度强化学习的行程规划
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390727
Shengxin Chen, Bo-Hao Chen, Zhaojiong Chen, Yunbing Wu
{"title":"Itinerary Planning via Deep Reinforcement Learning","authors":"Shengxin Chen, Bo-Hao Chen, Zhaojiong Chen, Yunbing Wu","doi":"10.1145/3372278.3390727","DOIUrl":"https://doi.org/10.1145/3372278.3390727","url":null,"abstract":"Itinerary planning that provides tailor-made tours for each traveler is a fundamental yet inefficient task in route recommendation. In this paper, we propose an automatic route recommendation approach with deep reinforcement learning to solve the itinerary planning problem. We formulate automatic generation of route recommendation as Markov Decision Process (MDP) and then solve it by our variational agent optimized through deep Q-learning algorithm. We train our agent using open data over various cities and show that the agent accomplishes notable improvement in comparison with other state-of-the-art methods.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123515503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Emotion Recognition from Galvanic Skin Response Signal Based on Deep Hybrid Neural Networks 基于深度混合神经网络的皮肤电反应信号情感识别
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390738
Imam Yogie Susanto, Tse-Yu Pan, Chien-Wen Chen, Min-Chun Hu, Wen-Huang Cheng
{"title":"Emotion Recognition from Galvanic Skin Response Signal Based on Deep Hybrid Neural Networks","authors":"Imam Yogie Susanto, Tse-Yu Pan, Chien-Wen Chen, Min-Chun Hu, Wen-Huang Cheng","doi":"10.1145/3372278.3390738","DOIUrl":"https://doi.org/10.1145/3372278.3390738","url":null,"abstract":"Emotion reacts human beings' physiological and psychological status. Galvanic Skin Response (GSR) can reveal the electrical characteristics of human skin and is widely used to recognize the presence of emotion. In this work, we propose an emotion recognition frame-work based on deep hybrid neural networks, in which 1D CNN and Residual Bidirectional GRU are employed for time series data analysis. The experimental results show that the proposed method can outperform other state-of-the-art methods. In addition, we port the proposed emotion recognition model on Raspberry Pi and design a real-time emotion interaction robot to verify the efficiency of this work.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128080958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
YOLO-mini-tiger: Amur Tiger Detection yolo -迷你老虎:东北虎探测
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-01 DOI: 10.1145/3372278.3390710
Runchen Wei, Ning He, K. Lu
{"title":"YOLO-mini-tiger: Amur Tiger Detection","authors":"Runchen Wei, Ning He, K. Lu","doi":"10.1145/3372278.3390710","DOIUrl":"https://doi.org/10.1145/3372278.3390710","url":null,"abstract":"In this paper, we present our solution for tiger detection in the 2019 Computer Vision for Wildlife Conservation Challenge (CVWC2019). We introduce an efficient deep tiger detector, which consists of the convnet channel adaptation method and an improved tiger detection method based on You Only Look Once version 3 (YOLOv3). Considering the limited memory and computing power of tiny embedded devices, we have used EfficientNet-B0 and Darknet-53 as backbone networks for detection and adapted them to balance their depth and width inspired by the channel pruning method and knowledge distillation method. Our results show that after an architecture adjustment of Darknet-53, the floating-point computation decreases by 93%, its model size decreases by 97%, and its accuracy only decreases by 1%; after an architecture adjustment of EfficientNet-B0, the floating-point computation decreases by 66%, its model size decreases by 70% with its accuracy only decreased by 1%. We also compare GIoU loss and MSE loss in the training stage. The GIoU loss has the advantage that it increases the average AP for IoU from 0.5 to 0.95 without affecting training speed and the interface speed, so it is experimentally reasonable for tiger detection in the wild. This proposed method outperforms previous Amur tiger detection methods presented at CVWC2019.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"380 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115477694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Super-Resolution Coding Defense Against Adversarial Examples 针对对抗性示例的超分辨率编码防御
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-01 DOI: 10.1145/3372278.3390689
Yanjie Chen, Likun Cai, Wei Cheng, Hongya Wang
{"title":"Super-Resolution Coding Defense Against Adversarial Examples","authors":"Yanjie Chen, Likun Cai, Wei Cheng, Hongya Wang","doi":"10.1145/3372278.3390689","DOIUrl":"https://doi.org/10.1145/3372278.3390689","url":null,"abstract":"Deep neural networks have achieved state-of-the-art performance in many fields including image classification. However, recent studies show these models are vulnerable to adversarial examples formed by adding small but intentional perturbations to clean examples. In this paper, we introduce a significant defense method against adversarial examples. The key idea is to leverage a super-resolution coding (SR-coding) network to eliminate noise from adversarial examples. Furthermore, to boost the effect of defending noise, we propose a novel hybrid approach that incorporates SR-coding and adversarial training to train robust neural networks. Experiments on benchmark datasets demonstrate the effectiveness of our method against both the state-of-the-art white-box attacks and black-box attacks. The proposed approach significantly improves defense performance and achieves up to 41.26% improvement based on the accuracy by ResNet18 on PGD white-box attack.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127235593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书