Proceedings of the 19th ACM international conference on Multimedia最新文献

筛选
英文 中文
Colorizing tags in tag cloud: a novel query-by-tag music search system 标签云中的标签着色:一种新颖的按标签查询的音乐搜索系统
Proceedings of the 19th ACM international conference on Multimedia Pub Date : 2011-11-28 DOI: 10.1145/2072298.2072337
Ju-Chiang Wang, Yu-Chin Shih, Meng-Sung Wu, H. Wang, Shyh-Kang Jeng
{"title":"Colorizing tags in tag cloud: a novel query-by-tag music search system","authors":"Ju-Chiang Wang, Yu-Chin Shih, Meng-Sung Wu, H. Wang, Shyh-Kang Jeng","doi":"10.1145/2072298.2072337","DOIUrl":"https://doi.org/10.1145/2072298.2072337","url":null,"abstract":"This paper presents a novel content-based query-by-tag music search system for an untagged music database. We design a new tag query interface that allows users to input multiple tags with multiple levels of preference (denoted as an MTML query) by colorizing desired tags in a web-based tag cloud interface. When a user clicks and holds the left mouse button (or presses and holds his/her finger on a touch screen) on a desired tag, the color of the tag will change cyclically according to a color map (from dark blue to bright red), which represents the level of preference (from 0 to 1). In this way, the user can easily organize and check the query of multiple tags with multiple levels of preference through the colored tags. To effect the MTML content-based music retrieval, we introduce a probabilistic fusion model (denoted as GMFM), which consists of two mixture models, namely a Gaussian mixture model and a multinomial mixture model. GMFM can jointly model the auditory features and tag labels of a song. Two indexing methods and their corresponding matching methods, namely pseudo song-based matching and tag affinity-based matching, are incorporated into the pre-learned GMFM. We evaluate the proposed system on the MajorMiner and CAL-500 datasets. The experimental results demonstrate the effectiveness of GMFM and the potential of using MTML queries to search music from an untagged music database.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"208 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131546034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Bilinear deep learning for image classification 用于图像分类的双线性深度学习
Proceedings of the 19th ACM international conference on Multimedia Pub Date : 2011-11-28 DOI: 10.1145/2072298.2072505
S. Zhong, Y. Liu, Yang Liu
{"title":"Bilinear deep learning for image classification","authors":"S. Zhong, Y. Liu, Yang Liu","doi":"10.1145/2072298.2072505","DOIUrl":"https://doi.org/10.1145/2072298.2072505","url":null,"abstract":"","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131704598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Random partial paired comparison for subjective video quality assessment via hodgerank 基于hodgerank主观视频质量评价的随机偏配对比较
Proceedings of the 19th ACM international conference on Multimedia Pub Date : 2011-11-28 DOI: 10.1145/2072298.2072350
Qianqian Xu, Tingting Jiang, Y. Yao, Qingming Huang, Bowei Yan, Weisi Lin
{"title":"Random partial paired comparison for subjective video quality assessment via hodgerank","authors":"Qianqian Xu, Tingting Jiang, Y. Yao, Qingming Huang, Bowei Yan, Weisi Lin","doi":"10.1145/2072298.2072350","DOIUrl":"https://doi.org/10.1145/2072298.2072350","url":null,"abstract":"Subjective visual quality evaluation provides the groundtruth and source of inspiration in building objective visual quality metrics. Paired comparison is expected to yield more reliable results; however, this is an expensive and timeconsuming process. In this paper, we propose a novel framework of HodgeRank on Random Graphs (HRRG) to achieve efficient and reliable subjective Video Quality Assessment (VQA). To address the challenge of a potentially large number of combinations of videos to be assessed, the proposed methodology does not require the participants to perform the complete comparison of all the paired videos. Instead, participants only need to perform a random sample of all possible paired comparisons, which saves a great amount of time and labor. In contrast to the traditional deterministic incomplete block designs, our random design is not only suitable for traditional laboratory and focus-group studies, but also fit for crowdsourcing experiments on Internet where the raters are distributive over Internet and it is hard to control with precise experimental designs. Our contribution in this work is three-fold: 1) a HRRG framework is proposed to quantify the quality of video; 2) a new random design principle is investigated to conduct paired comparison based on Erdos-Renyi random graph theory; 3) Hodge decomposition is introduced to derive, from incomplete and imbalanced data, quality scores of videos and inconsistency of participants'judgments. We demonstrate the effectiveness of the proposed framework on LIVE Database. Equipped with random graph theory and HodgeRank, our scheme has the following advantages over the traditional ones: 1) data collection is simple and easy to handle, and thus is more suitable for crowdsourcing on Internet; 2) workload on participants is lower and more flexible; 3) the rating procedure is efficient, labor-saving, and more importantly, without jeopardizing the accuracy of the results.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130288224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Tennis real play: an interactive tennis game with models from real videos 网球真人游戏:一个互动的网球游戏与模型从真实的视频
Proceedings of the 19th ACM international conference on Multimedia Pub Date : 2011-11-28 DOI: 10.1145/2072298.2072361
Jui-Hsin Lai, Chieh-Li Chen, Po-Chen Wu, Chieh-Chi Kao, Shao-Yi Chien
{"title":"Tennis real play: an interactive tennis game with models from real videos","authors":"Jui-Hsin Lai, Chieh-Li Chen, Po-Chen Wu, Chieh-Chi Kao, Shao-Yi Chien","doi":"10.1145/2072298.2072361","DOIUrl":"https://doi.org/10.1145/2072298.2072361","url":null,"abstract":"Tennis Real Play (TRP) is an interactive tennis game system constructed with models extracted from videos of real matches. The key techniques proposed for TRP include player modeling and video-based player/court rendering. For player model creation, we propose a database normalization process and a behavioral transition model of tennis players, which might be a good alternative for motion capture in the conventional video games. For player/court rendering, we propose a framework for rendering vivid game characters and providing the real-time ability. We can say that image-based rendering leads to a more interactive and realistic rendering. Experiments show that video games with vivid viewing effects and characteristic players can be generated from match videos without much user intervention. Because the player model can adequately record the ability and condition of a player in the real world, it can then be used to roughly predict the results of real tennis matches in the next days. The results of a user study reveal that subjects like the increased interaction, immersive experience, and enjoyment from playing TRP.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129007889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
SRV-TaGS: An Automatic TAGging and Search System for Sensor-Rich Outdoor Videos SRV-TaGS:一个传感器丰富的户外视频自动标记和搜索系统
Proceedings of the 19th ACM international conference on Multimedia Pub Date : 2011-11-28 DOI: 10.1145/2072298.2072444
Zhijie Shen, Sakire Arslan Ay, S. H. Kim
{"title":"SRV-TaGS: An Automatic TAGging and Search System for Sensor-Rich Outdoor Videos","authors":"Zhijie Shen, Sakire Arslan Ay, S. H. Kim","doi":"10.1145/2072298.2072444","DOIUrl":"https://doi.org/10.1145/2072298.2072444","url":null,"abstract":"Tagging facilitates video search in many social media and web applications. While manual tagging is time consuming, subjective and sometimes inaccurate, auto-tagging facilitated by content-based techniques is compute-intensive and challenging to apply across domains. We have developed a complementary system, named SRV-TAGS, to automatically generate tags for outdoor videos based on their geographic properties, to index the videos based on their generated tags and to provide textual search services. The system works with our geo-referenced video management web portal, enabling users to manage, search and watch videos.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"222 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122405372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contextual synonym dictionary for visual object retrieval 上下文同义词字典的视觉对象检索
Proceedings of the 19th ACM international conference on Multimedia Pub Date : 2011-11-28 DOI: 10.1145/2072298.2072364
Wenbin Tang, Rui Cai, Zhiwei Li, Lei Zhang
{"title":"Contextual synonym dictionary for visual object retrieval","authors":"Wenbin Tang, Rui Cai, Zhiwei Li, Lei Zhang","doi":"10.1145/2072298.2072364","DOIUrl":"https://doi.org/10.1145/2072298.2072364","url":null,"abstract":"In this paper, we study the problem of visual object retrieval by introducing a dictionary of contextual synonyms to narrow down the semantic gap in visual word quantization. The basic idea is to expand a visual word in the query image with its synonyms to boost the retrieval recall. Unlike the existing work such as soft-quantization, which only focuses on the Euclidean (l2) distance in descriptor space, we utilize the visual words which are more likely to describe visual objects with the same semantic meaning by identifying the words with similar contextual distributions (i.e. contextual synonyms). We describe the contextual distribution of a visual word using the statistics of both co-occurrence and spatial information averaged over all the image patches having this visual word, and propose an efficient system implementation to construct the contextual synonym dictionary for a large visual vocabulary. The whole construction process is unsupervised and the synonym dictionary can be naturally integrated into a standard bag-of-feature image retrieval system. Experimental results on several benchmark datasets are quite promising. The contextual synonym dictionary-based expansion consistently outperforms the l2 distance-based soft-quantization, and advances the state-of-the-art performance remarkably.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127709452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
The FCam API for programmable cameras 用于可编程相机的FCam API
Proceedings of the 19th ACM international conference on Multimedia Pub Date : 2011-11-28 DOI: 10.1145/2072298.2072425
S. H. Park, Andrew Adams, Eino-Ville Talvala
{"title":"The FCam API for programmable cameras","authors":"S. H. Park, Andrew Adams, Eino-Ville Talvala","doi":"10.1145/2072298.2072425","DOIUrl":"https://doi.org/10.1145/2072298.2072425","url":null,"abstract":"The FCam API is an open-source camera control library, enabling precise control over a camera's imaging pipeline. Intended for researchers and students in the field of computational photography, it allows easy implementation of novel algorithms and applications. Currently implemented on the Nokia N900 smartphone, and a custom-built \"Frankencamera\", it has been used in teaching at universities around the world, and is freely available for download for the N900. This paper describes the architecture underlying the API, the design of the API itself, several applications built on top of it, and some examples of its use in education.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"23 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121423730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Extracting intentionally captured regions using point trajectories 使用点轨迹提取有意捕获的区域
Proceedings of the 19th ACM international conference on Multimedia Pub Date : 2011-11-28 DOI: 10.1145/2072298.2072029
Yuta Nakashima, N. Babaguchi
{"title":"Extracting intentionally captured regions using point trajectories","authors":"Yuta Nakashima, N. Babaguchi","doi":"10.1145/2072298.2072029","DOIUrl":"https://doi.org/10.1145/2072298.2072029","url":null,"abstract":"When camera persons take videos with mobile video cameras, they usually have capture intentions, i.e., what they want to express in their videos, and there are intentionally captured regions (ICRs) in the video frames that are essential for the capture intentions. Extracting ICRs is thus beneficial for wide range of applications such as video summarization and video adaptation for small displays. In this paper, we present a novel method for automatically extracting ICRs. A camera person usually moves his/her camera so that ICRs can be arranged in appropriate positions in video frames; therefore, ICRs can yield specific motion. This observation indicates that such specific motion is a vital cue for extracting ICRs. The proposed method represents motion by point trajectories, which are long-term trajectories of spatially dense points in video frames, and extracts ICRs using an ICR model based on the point trajectories. We experimentally evaluate the proposed method to demonstrate its potential applicability.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129389797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
News contextualization with geographic and visual information 新闻语境化与地理和视觉信息
Proceedings of the 19th ACM international conference on Multimedia Pub Date : 2011-11-28 DOI: 10.1145/2072298.2072317
Zechao Li, M. Wang, J. Liu, Changsheng Xu, Hanqing Lu
{"title":"News contextualization with geographic and visual information","authors":"Zechao Li, M. Wang, J. Liu, Changsheng Xu, Hanqing Lu","doi":"10.1145/2072298.2072317","DOIUrl":"https://doi.org/10.1145/2072298.2072317","url":null,"abstract":"In this paper, we investigate the contextualization of news documents with geographic and visual information. We propose a matrix factorization approach to analyze the location relevance for each news document. We also propose a method to enrich the document with a set of web images. For location relevance analysis, we first perform toponym extraction and expansion to obtain a toponym list from news documents. We then propose a matrix factorization method to estimate the location-document relevance scores while simultaneously capturing the correlation of locations and documents. For image enrichment, we propose a method to generate multiple queries from each news document for image search and then employ an intelligent fusion approach to collect a set of images from the search results. Based on the location relevance analysis and image enrichment, we introduce a news browsing system named NewsMap which can support users in reading news via browsing a map and retrieving news with location queries. The news documents with the corresponding enriched images are presented to help users quickly get information. Extensive experiments demonstrate the effectiveness of our approaches.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117080441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
StoryImaging: a media-rich presentation system for textual stories StoryImaging:文本故事的富媒体呈现系统
Proceedings of the 19th ACM international conference on Multimedia Pub Date : 2011-11-28 DOI: 10.1145/2072298.2072451
Genliang Guan, Zhiyong Wang, Xiansheng Hua, D. Feng
{"title":"StoryImaging: a media-rich presentation system for textual stories","authors":"Genliang Guan, Zhiyong Wang, Xiansheng Hua, D. Feng","doi":"10.1145/2072298.2072451","DOIUrl":"https://doi.org/10.1145/2072298.2072451","url":null,"abstract":"In this demo, we develop the StoryImaging system to illustrate a textual story with both images harvested from the Web and synthesized speech. At the backend, a story is firstly processed to identify key terms such as named entities and to obtain the story summary. With the aid of commercial search engines, images are then collected from the Web for those key terms and re-ranked by taking the summary as context. At last, images are clustered to provide an overview of the story. At the web-based frontend, the user interface has been tailored to both improve information comprehension and provide engaging and explorative experiences for users by closely bridging textual and visual modalities.","PeriodicalId":318758,"journal":{"name":"Proceedings of the 19th ACM international conference on Multimedia","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117340968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信