Proceedings of the 24th ACM international conference on Multimedia最新文献_第9页

Modular Parallelization Framework for Multi-Stream Video Processing 多流视频处理的模块化并行化框架

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2973799

Tim Lenertz, G. Lafruit

引用次数: 0

vitrivr: A Flexible Retrieval Stack Supporting Multiple Query Modes for Searching in Multimedia Collections vitrivr:一种支持多种查询模式的多媒体馆藏灵活检索堆栈

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2973797

Luca Rossetto, Ivan Giangreco, Claudiu Tanase, H. Schuldt

{"title":"vitrivr: A Flexible Retrieval Stack Supporting Multiple Query Modes for Searching in Multimedia Collections","authors":"Luca Rossetto, Ivan Giangreco, Claudiu Tanase, H. Schuldt","doi":"10.1145/2964284.2973797","DOIUrl":"https://doi.org/10.1145/2964284.2973797","url":null,"abstract":"vitrivr is an open source full-stack content-based multimedia retrieval system with focus on video. Unlike the majority of the existing multimedia search solutions, vitrivr is not limited to searching in metadata, but also provides content-based search and thus offers a large variety of different query modes which can be seamlessly combined: Query by sketch, which allows the user to draw a sketch of a query image and/or sketch motion paths, Query by example, keyword search, and relevance feedback. The vitrivr architecture is self-contained and addresses all aspects of multimedia search, from offline feature extraction, database management to frontend user interaction. The system is composed of three modules: a web-based frontend which allows the user to input the query (e.g., add a sketch) and browse the retrieved results (vitrivr-ui), a database system designed for interactive search in large-scale multimedia collections (ADAM), and a retrieval engine that handles feature extraction and feature-based retrieval (Cineast). The vitrivr source is available on GitHub under the MIT open source (and similar) licenses and is currently undergoing several upgrades as part of the Google Summer of Code 2016.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115265089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 51

Weighted Linear Fusion of Multimodal Data: A Reasonable Baseline? 多模态数据加权线性融合:一个合理的基线?

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2964304

Ognjen Arandjelovic

{"title":"Weighted Linear Fusion of Multimodal Data: A Reasonable Baseline?","authors":"Ognjen Arandjelovic","doi":"10.1145/2964284.2964304","DOIUrl":"https://doi.org/10.1145/2964284.2964304","url":null,"abstract":"The ever-increasing demand for reliable inference capable of handling unpredictable challenges of practical application in the real world, has made research on information fusion of major importance. There are few fields of application and research where this is more evident than in the sphere of multimedia which by its very nature inherently involves the use of multiple modalities, be it for learning, prediction, or human-computer interaction, say. In the development of the most common type, score-level fusion algorithms, it is virtually without an exception desirable to have as a reference starting point a simple and universally sound baseline benchmark which newly developed approaches can be compared to. One of the most pervasively used methods is that of weighted linear fusion. It has cemented itself as the default off-the-shelf baseline owing to its simplicity of implementation, interpretability, and surprisingly competitive performance across a wide range of application domains and information source types. In this paper I argue that despite this track record, weighted linear fusion is not a good baseline on the grounds that there is an equally simple and interpretable alternative - namely quadratic mean-based fusion - which is theoretically more principled and which is more successful in practice. I argue the former from first principles and demonstrate the latter using a series of experiments on a diverse set of fusion problems: computer vision-based object recognition, arrhythmia detection, and fatality prediction in motor vehicle accidents.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"347 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115411172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Efficient Digital Holographic Image Reconstruction on Mobile Devices 移动设备上的高效数字全息图像重建

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967192

Chung-Hua Chu

引用次数: 0

SenseCap: Synchronized Data Collection with Microsoft Kinect2 and LeapMotion SenseCap:同步数据收集与微软Kinect2和LeapMotion

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2973805

Julian F. P. Kooij

引用次数: 6

Tamp: A Library for Compact Deep Neural Networks with Structured Matrices 一个具有结构化矩阵的紧凑深度神经网络库

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2973802

Bingchen Gong, Brendan Jou, Felix X. Yu, Shih-Fu Chang

引用次数: 1

Image Emotion Computing 图像情感计算

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2971473

Sicheng Zhao

{"title":"Image Emotion Computing","authors":"Sicheng Zhao","doi":"10.1145/2964284.2971473","DOIUrl":"https://doi.org/10.1145/2964284.2971473","url":null,"abstract":"Images can convey rich semantics and induce strong emotions in viewers. My research aims to predict image emotions from different aspects with respect to two main challenges: affective gap and subjective evaluation. To bridge the affective gap, we extract emotion features based on principles-of-art to recognize image-centric dominant emotions. As the emotions that are induced in viewers by an image are highly subjective and different, we propose to predict user-centric personalized emotion perceptions for each viewer and image-centric emotion probability distribution for each image. To tackle the subjective evaluation issue, we set up a large scale image emotion dataset from Flickr, named Image-Emotion-Social-Net, on both dimensional and categorical emotion representations with over 1 million images and about 8,000 users. Different types of factors may influence personalized image emotion perceptions, including visual content, social context, temporal evolution and location influence. We make an initial attempt to jointly combine them by the proposed rolling multi-task hypergraph learning. Both discrete and continuous emotion distributions are modelled via shared sparse learning. Further, several potential applications based on image emotions are designed and implemented.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125730779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Frustratingly Easy Cross-Modal Hashing 令人沮丧的简单跨模态哈希

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967218

Dekui Ma, Jian Liang, Xiangwei Kong, R. He

{"title":"Frustratingly Easy Cross-Modal Hashing","authors":"Dekui Ma, Jian Liang, Xiangwei Kong, R. He","doi":"10.1145/2964284.2967218","DOIUrl":"https://doi.org/10.1145/2964284.2967218","url":null,"abstract":"Cross-modal hashing has attracted considerable attention due to its low storage cost and fast retrieval speed. Recently, more and more sophisticated researches related to this topic are proposed. However, they seem to be inefficient computationally for several reasons. On one hand, learning coupled hash projections makes the iterative optimization problem challenging. On the other hand, individual collective binary codes for each content are also learned with a high computation complexity. In this paper we describe a simple yet effective cross-modal hashing approach that can be implemented in just three lines of code. This approach first obtains the binary codes for one modality via unimodal hashing methods (e.g., iterative quantization (ITQ)), then applies simple linear regression to project the other modalities into the obtained binary subspace. Obviously, it is non-iterative and parameter-free, which makes it more attractive for many real-world applications. We further compare our approach with other state-of-the-art methods on four benchmark datasets (i.e., the Wiki, VOC, LabelMe and NUS-WIDE datasets). Despite its extraordinary simplicity, our approach performs remarkably and generally well for these datasets under different experimental settings (i.e., large-scale, high-dimensional and multi-label datasets).","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128071934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Automatic Music Video Generation Based on Emotion-Oriented Pseudo Song Prediction and Matching 基于情感导向的伪歌预测与匹配的音乐视频自动生成

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967245

Jen-Chun Lin, Wen-Li Wei, H. Wang

{"title":"Automatic Music Video Generation Based on Emotion-Oriented Pseudo Song Prediction and Matching","authors":"Jen-Chun Lin, Wen-Li Wei, H. Wang","doi":"10.1145/2964284.2967245","DOIUrl":"https://doi.org/10.1145/2964284.2967245","url":null,"abstract":"The main difficulty in automatic music video (MV) generation lies in how to match two different media (i.e., video and music). This paper proposes a novel content-based MV generation system based on emotion-oriented pseudo song prediction and matching. We use a multi-task deep neural network (MDNN) to jointly learn the relationship among music, video, and emotion from an emotion-annotated MV corpus. Given a queried video, the MDNN is applied to predict the acoustic (music) features from the visual (video) features, i.e., the pseudo song corresponding to the video. Then, the pseudo acoustic (music) features are matched with the acoustic (music) features of each music track in the music collection according to a pseudo-song-based deep similarity matching (PDSM) metric given by another deep neural network (DNN) trained on the acoustic and pseudo acoustic features of the positive (official), less-positive (artificial), and negative (artificial) MV examples. The results of objective and subjective experiments demonstrate that the proposed pseudo-song-based framework performs well and can generate appealing MVs with better viewing and listening experiences.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124878298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Sentiment and Emotion Analysis for Social Multimedia: Methodologies and Applications 社交多媒体的情感与情绪分析:方法与应用

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2971475

Quanzeng You

{"title":"Sentiment and Emotion Analysis for Social Multimedia: Methodologies and Applications","authors":"Quanzeng You","doi":"10.1145/2964284.2971475","DOIUrl":"https://doi.org/10.1145/2964284.2971475","url":null,"abstract":"Online social networks have attracted the attention from both the academia and real world. In particular, the rich multimedia information accumulated in recent years provides an easy and convenient way for more active communication between people. This offers an opportunity to research people's behaviors and activities based on those multimedia content. One emerging area is driven by the fact that these massive multimedia data contain people's daily sentiments and opinions. However, existing sentiment analysis typically focuses on textual information regardless of the visual content, which may be as informative in expressing people's sentiments and opinions. In this research, we attempt to analyze the online sentiment changes of social media users using both the textual and visual content. Nowadays, social media networks such as Twitter have become major platforms of information exchange and communication between users, with tweets as the common information carrier. As an old saying has it, an image is worth a thousand words. The image tweet is a great example of multimodal sentiment. In this research, we focus on sentiment analysis based on visual and multimedia information analysis. We will review the state-of-the-art in this topic. Several of our projects related to this research area will also be discussed. Experimental results are included to demonstrate and summarize our contributions.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124317652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32