Proceedings of the 21st ACM international conference on Multimedia最新文献

筛选
英文 中文
Robust evaluation for quality of experience in crowdsourcing 对众包经验质量的稳健评估
Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502083
Qianqian Xu, Jiechao Xiong, Qingming Huang, Y. Yao
{"title":"Robust evaluation for quality of experience in crowdsourcing","authors":"Qianqian Xu, Jiechao Xiong, Qingming Huang, Y. Yao","doi":"10.1145/2502081.2502083","DOIUrl":"https://doi.org/10.1145/2502081.2502083","url":null,"abstract":"Strategies exploiting crowdsourcing are increasingly being applied in the area of Quality of Experience (QoE) for multimedia. They enable researchers to conduct experiments with a more diverse set of participants and at a lower economic cost than conventional laboratory studies. However, a major challenge for crowdsourcing tests is the detection and control of outliers, which may arise due to different test conditions, human errors or abnormal variations in context. For this purpose, it is desired to develop a robust evaluation methodology to deal with crowdsourceable data, which are possibly incomplete, imbalanced, and distributed on a graph. In this paper, we propose a robust rating scheme based on robust regression and Hodge Decomposition on graphs, to assess QoE using crowdsourcing. The scheme shows that the removal of outliers in crowdsourcing experiments would be helpful for purifying data and could provide us with more reliable results. The effectiveness of the proposed scheme is further confirmed by experimental studies on both simulated examples and real-world data.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"53 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86749493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
ImproveMyCity: an open source platform for direct citizen-government communication improemycity:公民与政府直接沟通的开源平台
Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502225
I. Tsampoulatidis, D. Ververidis, P. Tsarchopoulos, S. Nikolopoulos, Y. Kompatsiaris, N. Komninos
{"title":"ImproveMyCity: an open source platform for direct citizen-government communication","authors":"I. Tsampoulatidis, D. Ververidis, P. Tsarchopoulos, S. Nikolopoulos, Y. Kompatsiaris, N. Komninos","doi":"10.1145/2502081.2502225","DOIUrl":"https://doi.org/10.1145/2502081.2502225","url":null,"abstract":"ImproveMyCity is an open source platform that enables residents to directly report to their public administration local issues about their neighborhood such as discarded trash bins, faulty street lights, broken tiles on sidewalks, illegal advertising boards, etc. The reported issues are automatically transmitted to the appropriate office in public administration so as to schedule their settlement. Reporting is feasible both through a web- and a smartphone-based front-end that adopt a map-based visualization, which makes reporting a user-friendly and intriguing process. The management and routing of incoming issues is performed through a back-end infrastructure that serves as an integrated management system with easy to use interfaces. Apart from reporting a new issue, both front-ends allow the citizens to add comments or vote on existing issues, which adds a social dimension on the collected content. Finally, the platform makes also provision for informing the citizens about the progress status of the reported issue and in this way facilitate the establishment of a two-way dialogue between the citizen and public administration.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"02 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86104429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Understanding and classifying image tweets 理解和分类图片推文
Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502203
Tao Chen, Dongyuan Lu, Min-Yen Kan, Peng Cui
{"title":"Understanding and classifying image tweets","authors":"Tao Chen, Dongyuan Lu, Min-Yen Kan, Peng Cui","doi":"10.1145/2502081.2502203","DOIUrl":"https://doi.org/10.1145/2502081.2502203","url":null,"abstract":"Social media platforms now allow users to share images alongside their textual posts. These image tweets make up a fast-growing percentage of tweets, but have not been studied in depth unlike their text-only counterparts. We study a large corpus of image tweets in order to uncover what people post about and the correlation between the tweet's image and its text. We show that an important functional distinction is between visually-relevant and visually-irrelevant tweets, and that we can successfully build an automated classifier utilizing text, image and social context features to distinguish these two classes, obtaining a macro F1 of 70.5%.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"117 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80506879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 81
Object coding on the semantic graph for scene classification 基于语义图的对象编码用于场景分类
Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502131
Jingjing Chen, Yahong Han, Xiaochun Cao, Q. Tian
{"title":"Object coding on the semantic graph for scene classification","authors":"Jingjing Chen, Yahong Han, Xiaochun Cao, Q. Tian","doi":"10.1145/2502081.2502131","DOIUrl":"https://doi.org/10.1145/2502081.2502131","url":null,"abstract":"In the scene classification, a scene can be considered as a set of object cliques. Objects inside each clique have semantic correlations with each other, while two objects from different cliques are relatively independent. To utilize these correlations for better recognition performance, we propose a new method - Object Coding on the Semantic Graph to address the scene classification problem. We first exploit prior knowledge by making statistics on a large number of labeled images and calculating the dependency degree between objects. Then, a graph is built to model the semantic correlations between objects. This semantic graph captures semantics by treating the objects as vertices and the objects affinities as the weights of edges. By encoding this semantic knowledge into the semantic graph, object coding is conducted to automatically select a set of object cliques that have strongly semantic correlations to represent a specific scene. The experimental results show that the Object Coding on semantic graph can improve the classification accuracy.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82248070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Segmental multi-way local pooling for video recognition 视频识别的分段多路局部池化
Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502167
Ilseo Kim, Sangmin Oh, Arash Vahdat, Kevin J. Cannons, A. Perera, Greg Mori
{"title":"Segmental multi-way local pooling for video recognition","authors":"Ilseo Kim, Sangmin Oh, Arash Vahdat, Kevin J. Cannons, A. Perera, Greg Mori","doi":"10.1145/2502081.2502167","DOIUrl":"https://doi.org/10.1145/2502081.2502167","url":null,"abstract":"In this work, we address the problem of complex event detection on unconstrained videos. We introduce a novel multi-way feature pooling approach which leverages segment-level information. The approach is simple and widely applicable to diverse audio-visual features. Our approach uses a set of clusters discovered via unsupervised clustering of segment-level features. Depending on feature characteristics, not only scene-based clusters but also motion/audio-based clusters can be incorporated. Then, every video is represented with multiple descriptors, where each descriptor is designed to relate to one of the pre-built clusters. For classification, intersection kernel SVMs are used where the kernel is obtained by combining multiple kernels computed from corresponding per-cluster descriptor pairs. Evaluation on TRECVID'11 MED dataset shows a significant improvement by the proposed approach beyond the state-of-the-art.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74694350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Learning latent spatio-temporal compositional model for human action recognition 学习潜在时空组成模型的人体动作识别
Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502089
Xiaodan Liang, Liang Lin, Liangliang Cao
{"title":"Learning latent spatio-temporal compositional model for human action recognition","authors":"Xiaodan Liang, Liang Lin, Liangliang Cao","doi":"10.1145/2502081.2502089","DOIUrl":"https://doi.org/10.1145/2502081.2502089","url":null,"abstract":"Action recognition is an important problem in multimedia understanding. This paper addresses this problem by building an expressive compositional action model. We model one action instance in the video with an ensemble of spatio-temporal compositions: a number of discrete temporal anchor frames, each of which is further decomposed to a layout of deformable parts. In this way, our model can identify a Spatio-Temporal And-Or Graph (STAOG) to represent the latent structure of actions emph{e.g.} triple jumping, swinging and high jumping. The STAOG model comprises four layers: (i) a batch of leaf-nodes in bottom for detecting various action parts within video patches; (ii) the or-nodes over bottom, i.e. switch variables to activate their children leaf-nodes for structural variability; (iii) the and-nodes within an anchor frame for verifying spatial composition; and (iv) the root-node at top for aggregating scores over temporal anchor frames. Moreover, the contextual interactions are defined between leaf-nodes in both spatial and temporal domains. For model training, we develop a novel weakly supervised learning algorithm which iteratively determines the structural configuration (e.g. the production of leaf-nodes associated with the or-nodes) along with the optimization of multi-layer parameters. By fully exploiting spatio-temporal compositions and interactions, our approach handles well large intra-class action variance (emph{e.g.} different views, individual appearances, spatio-temporal structures). The experimental results on the challenging databases demonstrate superior performance of our approach over other methods.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"113 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79389863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Semantic technologies for multimedia content: foundations and applications 多媒体内容的语义技术:基础和应用
Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502232
A. Scherp
{"title":"Semantic technologies for multimedia content: foundations and applications","authors":"A. Scherp","doi":"10.1145/2502081.2502232","DOIUrl":"https://doi.org/10.1145/2502081.2502232","url":null,"abstract":"Higher-level semantics for multimedia content is essential to answer questions like ``Give me all presentations of German Physicists of the 20th century''. The tutorial provides an introduction and overview to such semantics and the developments in multimedia metadata. It introduces current advancements for describing media on the web using Linked Open Data and other more expressive semantic technologies. The application of such technologies will be shown at concrete examples.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"72 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79202290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Time matters!: capturing variation in time in video using fisher kernels 时间很重要!:利用fisher核捕获视频中的时间变化
Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502183
Ionut Mironica, J. Uijlings, Negar Rostamzadeh, B. Ionescu, N. Sebe
{"title":"Time matters!: capturing variation in time in video using fisher kernels","authors":"Ionut Mironica, J. Uijlings, Negar Rostamzadeh, B. Ionescu, N. Sebe","doi":"10.1145/2502081.2502183","DOIUrl":"https://doi.org/10.1145/2502081.2502183","url":null,"abstract":"In video global features are often used for reasons of computational efficiency, where each global feature captures information of a single video frame. But frames in video change over time, so an important question is: how can we meaningfully aggregate frame-based features in order to preserve the variation in time? In this paper we propose to use the Fisher Kernel to capture variation in time in video. While in this approach the temporal order is lost, it captures both subtle variation in time such as the ones caused by a moving bicycle and drastic variations in time such as the changing of shots in a documentary. Our work should not be confused with a Bag of Local Visual Features approach, where one captures the visual variation of local features in both time and space indiscriminately. Instead, each feature measures a complete frame hence we capture variation in time only. We show that our framework is highly general, reporting improvements using frame-based visual features, body-part features, and audio features on three diverse datasets: We obtain state-of-the-art results on the UCF50 human action dataset and improve the state-of-the-art on the MediaEval 2012 video-genre benchmark and on the ADL daily activity recognition dataset.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75145252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
The space between the images 图像之间的空间
Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2509857
L. Guibas
{"title":"The space between the images","authors":"L. Guibas","doi":"10.1145/2502081.2509857","DOIUrl":"https://doi.org/10.1145/2502081.2509857","url":null,"abstract":"Multimedia content has become a ubiquitous presence on all our computing devices, spanning the gamut from live content captured by device sensors such as smartphone cameras to immense databases of images, audio and video stored in the cloud. As we try to maximize the utility and value of all these petabytes of content, we often do so by analyzing each piece of data individually and foregoing a deeper analysis of the relationships between the media. Yet with more and more data, there will be more and more connections and correlations, because the data captured comes from the same or similar objects, or because of particular repetitions, symmetries or other relations and self-relations that the data sources satisfy. This is particularly true for media of a geometric character, such as GPS traces, images, videos, 3D scans, 3D models, etc. In this talk we focus on the \"space between the images\", that is on expressing the relationships between different mutlimedia data items. We aim to make such relationships explicit, tangible, first-class objects that themselves can be analyzed, stored, and queried -- irrespective of the media they originate from. We discuss mathematical and algorithmic issues on how to represent and compute relationships or mappings between media data sets at multiple levels of detail. We also show how to analyze and leverage networks of maps and relationships, small and large, between inter-related data. The network can act as a regularizer, allowing us to to benefit from the \"wisdom of the collection\" in performing operations on individual data sets or in map inference between them. We will illustrate these ideas using examples from the realm of 2D images and 3D scans/shapes -- but these notions are more generally applicable to the analysis of videos, graphs, acoustic data, biological data such as microarrays, homeworks in MOOCs, etc. This is an overview of joint work with multiple collaborators, as will be discussed in the talk.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"133 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76174741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic pooling for complex event detection 用于复杂事件检测的语义池
Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI: 10.1145/2502081.2502191
Qian Yu, Jingen Liu, Hui Cheng, Ajay Divakaran, H. Sawhney
{"title":"Semantic pooling for complex event detection","authors":"Qian Yu, Jingen Liu, Hui Cheng, Ajay Divakaran, H. Sawhney","doi":"10.1145/2502081.2502191","DOIUrl":"https://doi.org/10.1145/2502081.2502191","url":null,"abstract":"Complex event detection is very challenging in open source such as You-Tube videos, which usually comprise very diverse visual contents involving various object, scene and action concepts. Not all of them, however, are relevant to the event. In other words, a video may contain a lot of \"junk\" information which is harmful for recognition. Hence, we propose a semantic pooling approach to tackle this issue. Unlike the conventional pooling over the entire video or specific spatial regions of a video, we employ a discriminative approach to acquire abstract semantic \"regions\" for pooling. For this purpose, we first associate low-level visual words with semantic concepts via their co-occurrence relationship. We then pool the low-level features separately according to their semantic information. The proposed semantic pooling strategy also provides a new mechanism for incorporating semantic concepts for low-level feature based event recognition. We evaluate our approach on TRECVID MED [1] dataset and the results show that semantic pooling consistently improves the performance compared with conventional pooling strategies.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74098214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信