Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval最新文献_第5页

Family Photo Recognition via Multiple Instance Learning 基于多实例学习的全家福识别

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3079036

Junkang Zhang, Siyu Xia, Ming Shao, Y. Fu

引用次数: 6

Visually Browsing Millions of Images Using Image Graphs 视觉浏览数以百万计的图像使用图像图形

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3079016

K. U. Barthel, N. Hezel, K. Jung

引用次数: 11

Deep Sentiment Features of Context and Faces for Affective Video Analysis 情感视频分析中语境与面孔的深层情感特征

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3079027

C. Baecchi, Tiberio Uricchio, M. Bertini, A. Bimbo

{"title":"Deep Sentiment Features of Context and Faces for Affective Video Analysis","authors":"C. Baecchi, Tiberio Uricchio, M. Bertini, A. Bimbo","doi":"10.1145/3078971.3079027","DOIUrl":"https://doi.org/10.1145/3078971.3079027","url":null,"abstract":"Given the huge quantity of hours of video available on video sharing platforms such as YouTube, Vimeo, etc. development of automatic tools that help users find videos that fit their interests has attracted the attention of both scientific and industrial communities. So far the majority of the works have addressed semantic analysis, to identify objects, scenes and events depicted in videos, but more recently affective analysis of videos has started to gain more attention. In this work we investigate the use of sentiment driven features to classify the induced sentiment of a video, i.e. the sentiment reaction of the user. Instead of using standard computer vision features such as CNN features or SIFT features trained to recognize objects and scenes, we exploit sentiment related features such as the ones provided by Deep-SentiBank, and features extracted from models that exploit deep networks trained on face expressions. We experiment on two recently introduced datasets: LIRIS-ACCEDE and MEDIAEVAL-2015, that provide sentiment annotations of a large set of short videos. We show that our approach not only outperforms the current state-of-the-art in terms of valence and arousal classification accuracy, but it also uses a smaller number of features, requiring thus less video processing.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127792744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Finger Vein Image Retrieval via Coding Scale-varied Superpixel Feature 基于编码尺度变化的超像素特征的手指静脉图像检索

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3078975

Kuikui Wang, Lu Yang, Gongping Yang, Xin Luo, Kun Su, Yilong Yin

{"title":"Finger Vein Image Retrieval via Coding Scale-varied Superpixel Feature","authors":"Kuikui Wang, Lu Yang, Gongping Yang, Xin Luo, Kun Su, Yilong Yin","doi":"10.1145/3078971.3078975","DOIUrl":"https://doi.org/10.1145/3078971.3078975","url":null,"abstract":"Finger vein image retrieval is one significant technique for performing fast identification especially in large-scale applications. However, most existing retrieval methods were based on fixed-scale feature of non-overlapped rectangular image block, in which the representation ability of feature and the local consistency of vein pattern were both overlooked. And the weak encoding (e.g., predefined threshold based binarization) was also limited the retrieval performance. Focusing on these problems, this paper proposes a novel finger vein image retrieval framework based on similarity-preserving encoding of scale-varied superpixel feature. In the framework, locally consistent pixels in one superpixel are used as a unit of feature representation, and the feature length is varied with the category of the superpixel classified by the variance of lowest dimensional feature. Additionally, the feature compaction and feature rotation based encoding can minimize the quantization loss and preserve the similarity between the scale-varied feature and the encoded binary codes. Experimental results on six public finger vein databases demonstrate that the superiority of the proposed coding scale-varied superpixel feature based retrieval approach over the state-of-the-arts.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124510042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Intelligently Connecting People with Information 用信息智能地连接人

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3081371

Changhu Wang

{"title":"Intelligently Connecting People with Information","authors":"Changhu Wang","doi":"10.1145/3078971.3081371","DOIUrl":"https://doi.org/10.1145/3078971.3081371","url":null,"abstract":"How to effectively connect people with information is a fundamental problem in human society. We are now in the era of mobile first, and everything is digitally connected. With the advent of diverse social contents, information feeds have become a new way to connect people with information. Thus, there is a pretty good opportunity for artificial intelligence (AI) to make innovations in this direction. AI can make more efficient and intelligent the creation, moderation, dissemination, searching, consumption, and interaction of information and contents. As an industry leader in the product platform and service of information feeds, Toutiao takes the lead to develop and leverage diverse machine learning techniques to efficiently process, analyze, mine, understand, and organize a large amount of multimedia data. Meanwhile, owning to its rich application scenarios and active users all over the world, we have accumulated huge amount of training data, which makes the machine learning system form a closed feedback loop and thus can continually improve and evolve itself. This closed-loop system enables Toutiao to develop core AI technologies in large-scale machine learning, text analysis, natural language processing, computer vision, and data mining. In this talk, I will share some personal opinions to the development prospects of AI in this fundamental area, including my understanding to AI, important research progress in recent years, the influence of AI to the software industry, and how to build the core competence strategy of AI in a company. Moreover, I will also introduce some research progress of Toutiao AI Lab.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116702096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Unsupervised Distance Learning Framework for Multimedia Retrieval 多媒体检索的无监督远程学习框架

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3079017

Lucas Pascotti Valem, D. C. G. Pedronette

引用次数: 10

Information Retrieval from Multi-Sensor Data for Enriching Location Services at HERE Technologies 从多传感器数据中提取信息以丰富HERE技术的定位服务

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3081370

Matei Stroila

引用次数: 0

Session details: Tutorials 会议详情:教程

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3254614

G. Awad

引用次数: 0

Improving Image Classification using Coarse and Fine Labels 改进图像分类的粗标签和细标签

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3079042

Anuvabh Dutt, D. Pellerin, G. Quénot

引用次数: 8

Musical Instrument Recognition in User-generated Videos using a Multimodal Convolutional Neural Network Architecture 基于多模态卷积神经网络的用户生成视频中的乐器识别

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3079002

Olga Slizovskaia, E. Gómez, G. Haro

{"title":"Musical Instrument Recognition in User-generated Videos using a Multimodal Convolutional Neural Network Architecture","authors":"Olga Slizovskaia, E. Gómez, G. Haro","doi":"10.1145/3078971.3079002","DOIUrl":"https://doi.org/10.1145/3078971.3079002","url":null,"abstract":"This paper presents a method for recognizing musical instruments in user-generated videos. Musical instrument recognition from music signals is a well-known task in the music information retrieval (MIR) field, where current approaches rely on the analysis of the good-quality audio material. This work addresses a real-world scenario with several research challenges, i.e. the analysis of user-generated videos that are varied in terms of recording conditions and quality and may contain multiple instruments sounding simultaneously and background noise. Our approach does not only focus on the analysis of audio information, but we exploit the multimodal information embedded in the audio and visual domains. In order to do so, we develop a Convolutional Neural Network (CNN) architecture which combines learned representations from both modalities at a late fusion stage. Our approach is trained and evaluated on two large-scale video datasets: YouTube-8M and FCVID. The proposed architectures demonstrate state-of-the-art results in audio and video object recognition, provide additional robustness to missing modalities, and remains computationally cheap to train.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126228058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11