Proceedings of the 24th ACM international conference on Multimedia最新文献

筛选
英文 中文
Partial Multi-Modal Sparse Coding via Adaptive Similarity Structure Regularization 基于自适应相似结构正则化的部分多模态稀疏编码
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967201
Zhou Zhao, Hanqing Lu, Deng Cai, Xiaofei He, Yueting Zhuang
{"title":"Partial Multi-Modal Sparse Coding via Adaptive Similarity Structure Regularization","authors":"Zhou Zhao, Hanqing Lu, Deng Cai, Xiaofei He, Yueting Zhuang","doi":"10.1145/2964284.2967201","DOIUrl":"https://doi.org/10.1145/2964284.2967201","url":null,"abstract":"Multi-modal sparse coding has played an important role in many multimedia applications, where data are usually with multiple modalities. Recently, various multi-modal sparse coding approaches have been proposed to learn sparse codes of multi-modal data, which assume that data appear in all modalities, or at least there is one modality containing all data. However, in real applications, it is often the case that some modalities of the data may suffer from missing information and thus result in partial multi-modality data. In this paper, we propose to solve the partial multi-modal sparse coding problem via multi-modal similarity structure regularization. Specifically, we propose a partial multi-modal sparse coding framework termed Adaptive Partial Multi-Modal Similarity Structure Regularization for Sparse Coding (AdaPM2SC), which preserves the similarity structure within the same modality and between different modalities. Experimental results conducted on two real-world datasets demonstrate that AdaPM2SC significantly outperforms the state-of-the-art methods under partial multi-modality scenario.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134311238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Semantic Description of Timbral Transformations in Music Production 音乐制作中音色变换的语义描述
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967238
R. Stables, B. D. Man, Sean Enderby, J. Reiss, György Fazekas, Thomas Wilmering
{"title":"Semantic Description of Timbral Transformations in Music Production","authors":"R. Stables, B. D. Man, Sean Enderby, J. Reiss, György Fazekas, Thomas Wilmering","doi":"10.1145/2964284.2967238","DOIUrl":"https://doi.org/10.1145/2964284.2967238","url":null,"abstract":"In music production, descriptive terminology is used to define perceived sound transformations. By understanding the underlying statistical features associated with these descriptions, we can aid the retrieval of contextually relevant processing parameters using natural language, and create intelligent systems capable of assisting in audio engineering. In this study, we present an analysis of a dataset containing descriptive terms gathered using a series of processing modules, embedded within a Digital Audio Workstation. By applying hierarchical clustering to the audio feature space, we show that similarity in term representations exists within and between transformation classes. Furthermore, the organisation of terms in low-dimensional timbre space can be explained using perceptual concepts such as size and dissonance. We conclude by performing Latent Semantic Indexing to show that similar groupings exist based on term frequency.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134630900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Weakly-Supervised Recognition, Localization, and Explanation of Visual Entities 视觉实体的弱监督识别、定位与解释
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2971479
P. Mettes
{"title":"Weakly-Supervised Recognition, Localization, and Explanation of Visual Entities","authors":"P. Mettes","doi":"10.1145/2964284.2971479","DOIUrl":"https://doi.org/10.1145/2964284.2971479","url":null,"abstract":"To learn from visual collections, manual annotations are required. Humans however can no longer keep up with providing strong and time consuming annotations on the ever increasing wealth of visual data. As a result, approaches are required that can learn from fast and weak forms of annotations in visual data. This doctorial symposium summarizes my ongoing PhD dissertation on how to utilize weakly-supervised annotations to recognize, localize, and explain visual entities in images and videos. In this context, visual entities denote objects, scenes, and actions (in images), and actions and events (in videos). The summary is performed through four publications. For each publication, we discuss the current state-of-the-art, as well as our proposed novelties and performed experiments. The end of the summary discusses several possibilities to extend the dissertation.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"169 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133168700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multi-Modal Learning: Study on A Large-Scale Micro-Video Data Collection 多模态学习:大规模微视频数据采集研究
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2971477
Jingyuan Chen
{"title":"Multi-Modal Learning: Study on A Large-Scale Micro-Video Data Collection","authors":"Jingyuan Chen","doi":"10.1145/2964284.2971477","DOIUrl":"https://doi.org/10.1145/2964284.2971477","url":null,"abstract":"Micro-video sharing social services, as a new phenomenon in social media, enable users to share micro-videos and thus gain increasing enthusiasm among people. One distinct characteristic of micro-videos is the multi-modality, as these videos always have visual signals, audio tracks, textual descriptions as well as social clues. Such multi-modality data makes it possible to obtain a comprehensive understanding of videos and hence provides new opportunities for researchers. However, limited efforts thus far have been dedicated to this new emerging user-generated contents (UGCs) due to the lack of large-scale benchmark dataset. Towards this end, in this paper, we construct a large-scale micro-video dataset, which can support many research domains, such as popularity prediction and venue estimation. Based upon this dataset, we conduct an initial study in popularity prediction of micro-videos. Finally, we identify our future work.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133295259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
ThePlantGame: Actively Training Human Annotators for Domain-specific Crowdsourcing ThePlantGame:积极训练特定领域众包的人类注释者
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2973820
Maximilien Servajean, A. Joly, D. Shasha, Julien Champ, Esther Pacitti
{"title":"ThePlantGame: Actively Training Human Annotators for Domain-specific Crowdsourcing","authors":"Maximilien Servajean, A. Joly, D. Shasha, Julien Champ, Esther Pacitti","doi":"10.1145/2964284.2973820","DOIUrl":"https://doi.org/10.1145/2964284.2973820","url":null,"abstract":"In a typical citizen science/crowdsourcing environment, the contributors label items. When there are few labels, it is straightforward to train contributors and judge the quality of their labels by giving a few examples with known answers. Neither is true when there are thousands of domain-specific labels and annotators with heterogeneous skills. This demo paper presents an Active User Training framework implemented as a serious game called ThePlantGame. It is based on a set of data-driven algorithms allowing to (i) actively train annotators, and (ii) evaluate the quality of contributors' answers on new test items to optimize predictions.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132065000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Performance Measurements of Virtual Reality Systems: Quantifying the Timing and Positioning Accuracy 虚拟现实系统的性能测量:量化定时和定位精度
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967303
Chun-Ming Chang, Cheng-Hsin Hsu, Chih-Fan Hsu, Kuan-Ta Chen
{"title":"Performance Measurements of Virtual Reality Systems: Quantifying the Timing and Positioning Accuracy","authors":"Chun-Ming Chang, Cheng-Hsin Hsu, Chih-Fan Hsu, Kuan-Ta Chen","doi":"10.1145/2964284.2967303","DOIUrl":"https://doi.org/10.1145/2964284.2967303","url":null,"abstract":"We propose the very first non-intrusive measurement methodology for quantifying the performance of commodity Virtual Reality (VR) systems. Our methodology considers the VR system under test as a black-box and works with any VR applications. Multiple performance metrics on timing and positioning accuracy are considered, and detailed testbed setup and measurement steps are presented. We also apply our methodology to several VR systems in the market, and carefully analyze the experiment results. We make several observations: (i) 3D scene complexity affects the timing accuracy the most, (ii) most VR systems implement the dead reckoning algorithm, which incurs a non-trivial correction latency after incorrect predictions, and (iii) there exists an inherent trade-off between two positioning accuracy metrics: precision and sensitivity.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127633958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Scene Image Synthesis from Natural Sentences Using Hierarchical Syntactic Analysis 基于层次句法分析的自然句子场景图像合成
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967193
Tetsuaki Mano, Hiroaki Yamane, T. Harada
{"title":"Scene Image Synthesis from Natural Sentences Using Hierarchical Syntactic Analysis","authors":"Tetsuaki Mano, Hiroaki Yamane, T. Harada","doi":"10.1145/2964284.2967193","DOIUrl":"https://doi.org/10.1145/2964284.2967193","url":null,"abstract":"Synthesizing a new image from verbal information is a challenging task that has a number of applications. Most research on the issue has attempted to address this question by providing external clues, such as sketches. However, no study has been able to successfully handle various sentences for this purpose without any other information. We propose a system to synthesize scene images solely from sentences. Input sentences are expected to be complete sentences with visualizable objects. Our priorities are the analysis of sentences and the correlation of information between input sentences and visible image patches. A hierarchical syntactic parser is developed for sentence analysis, and a combination of lexical knowledge and corpus statistics is designed for word correlation. The entire system was applied to both a clip-art dataset and an actual image dataset. This application highlighted the capability of the proposed system to generate novel images as well as its ability to succinctly convey ideas.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123950873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Query Adaptive Instance Search using Object Sketches 使用对象草图查询自适应实例搜索
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2964317
S. Bhattacharjee, Junsong Yuan, Weixiang Hong, Xiang Ruan
{"title":"Query Adaptive Instance Search using Object Sketches","authors":"S. Bhattacharjee, Junsong Yuan, Weixiang Hong, Xiang Ruan","doi":"10.1145/2964284.2964317","DOIUrl":"https://doi.org/10.1145/2964284.2964317","url":null,"abstract":"Sketch-based object search is a challenging problem mainly due to two difficulties: (1) how to match the binary sketch query with the colorful image, and (2) how to locate the small object in a big image with the sketch query. To address the above challenges, we propose to leverage object proposals for object search and localization. However, instead of purely relying on sketch features, e.g., Sketch-a-Net, to locate the candidate object proposals, we propose to fully utilize the appearance information to resolve the ambiguities among object proposals and refine the search results. Our proposed query adaptive search is formulated as a sub-graph selection problem, which can be solved by maximum flow algorithm. By performing query expansion using a smaller set of more salient matches as the query representatives, it can accurately locate the small target objects in cluttered background or densely drawn deformation intensive cartoon (Manga like) images. Our query adaptive sketch based object search on benchmark datasets exhibits superior performance when compared with existing methods, which validates the advantages of utilizing both the shape and appearance features for sketch-based search.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121052958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Joint Graph Learning and Video Segmentation via Multiple Cues and Topology Calibration 基于多线索和拓扑校准的联合图学习和视频分割
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2964295
Jingkuan Song, Lianli Gao, M. Puscas, F. Nie, Fumin Shen, N. Sebe
{"title":"Joint Graph Learning and Video Segmentation via Multiple Cues and Topology Calibration","authors":"Jingkuan Song, Lianli Gao, M. Puscas, F. Nie, Fumin Shen, N. Sebe","doi":"10.1145/2964284.2964295","DOIUrl":"https://doi.org/10.1145/2964284.2964295","url":null,"abstract":"Video segmentation has become an important and active research area with a large diversity of proposed approaches. Graph-based methods, enabling top performance on recent benchmarks, usually focus on either obtaining a precise similarity graph or designing efficient graph cutting strategies. However, these two components are often conducted in two separated steps, and thus the obtained similarity graph may not be the optimal one for segmentation and this may lead to suboptimal results. In this paper, we propose a novel framework, joint graph learning and video segmentation (JGLVS)}, which learns the similarity graph and video segmentation simultaneously. JGLVS learns the similarity graph by assigning adaptive neighbors for each vertex based on multiple cues (appearance, motion, boundary and spatial information). Meanwhile, the new rank constraint is imposed to the Laplacian matrix of the similarity graph, such that the connected components in the resulted similarity graph are exactly equal to the number of segmentations. Furthermore, JGLVS can automatically weigh multiple cues and calibrate the pairwise distance of superpixels based on their topology structures. Most noticeably, empirical results on the challenging dataset VSB100 show that JGLVS achieves promising performance on the benchmark dataset which outperforms the state-of-the-art by up to 11% for the BPR metric.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129192433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Improving Speaker Diarization of TV Series using Talking-Face Detection and Clustering 用说话脸检测和聚类改进电视连续剧的说话人特征
Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI: 10.1145/2964284.2967202
H. Bredin, G. Gelly
{"title":"Improving Speaker Diarization of TV Series using Talking-Face Detection and Clustering","authors":"H. Bredin, G. Gelly","doi":"10.1145/2964284.2967202","DOIUrl":"https://doi.org/10.1145/2964284.2967202","url":null,"abstract":"While successful on broadcast news, meetings or telephone conversation, state-of-the-art speaker diarization techniques tend to perform poorly on TV series or movies. In this paper, we propose to rely on state-of-the-art face clustering techniques to guide acoustic speaker diarization. Two approaches are tested and evaluated on the first season of Game Of Thrones TV series. The second (better) approach relies on a novel talking-face detection module based on bi-directional long short-term memory recurrent neural network. Both audio-visual approaches outperform the audio-only baseline. A detailed study of the behavior of these approaches is also provided and paves the way to future improvements.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128851409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信