Proceedings of the 1st ACM International Conference on Multimedia Retrieval最新文献_第6页

High-level event detection system based on discriminant visual concepts 基于判别视觉概念的高级事件检测系统

Proceedings of the 1st ACM International Conference on Multimedia Retrieval Pub Date : 2011-04-18 DOI: 10.1145/1991996.1992064

I. Tsampoulatidis, Nikolaos Gkalelis, A. Dimou, V. Mezaris, Y. Kompatsiaris

引用次数: 8

City exploration by use of spatio-temporal analysis and clustering of user contributed photos 利用用户贡献照片的时空分析和聚类进行城市探索

Proceedings of the 1st ACM International Conference on Multimedia Retrieval Pub Date : 2011-04-18 DOI: 10.1145/1991996.1992061

S. Papadopoulos, Christos Zigkolis, S. Kapiris, Y. Kompatsiaris, A. Vakali

引用次数: 6

Consumer video understanding: a benchmark database and an evaluation of human and machine performance 消费者视频理解:一个基准数据库和人类和机器性能的评估

Proceedings of the 1st ACM International Conference on Multimedia Retrieval Pub Date : 2011-04-18 DOI: 10.1145/1991996.1992025

Yu-Gang Jiang, Guangnan Ye, Shih-Fu Chang, D. Ellis, A. Loui

{"title":"Consumer video understanding: a benchmark database and an evaluation of human and machine performance","authors":"Yu-Gang Jiang, Guangnan Ye, Shih-Fu Chang, D. Ellis, A. Loui","doi":"10.1145/1991996.1992025","DOIUrl":"https://doi.org/10.1145/1991996.1992025","url":null,"abstract":"Recognizing visual content in unconstrained videos has become a very important problem for many applications. Existing corpora for video analysis lack scale and/or content diversity, and thus limited the needed progress in this critical area. In this paper, we describe and release a new database called CCV, containing 9,317 web videos over 20 semantic categories, including events like \"baseball\" and \"parade\", scenes like \"beach\", and objects like \"cat\". The database was collected with extra care to ensure relevance to consumer interest and originality of video content without post-editing. Such videos typically have very little textual annotation and thus can benefit from the development of automatic content analysis techniques. We used Amazon MTurk platform to perform manual annotation, and studied the behaviors and performance of human annotators on MTurk. We also compared the abilities in understanding consumer video content by humans and machines. For the latter, we implemented automatic classifiers using state-of-the-art multi-modal approach that achieved top performance in recent TRECVID multimedia event detection task. Results confirmed classifiers fusing audio and video features significantly outperform single-modality solutions. We also found that humans are much better at understanding categories of nonrigid objects such as \"cat\", while current automatic techniques are relatively close to humans in recognizing categories that have distinctive background scenes or audio patterns.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133035039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 291

Learning reconfigurable hashing for diverse semantics 学习不同语义的可重构哈希

Proceedings of the 1st ACM International Conference on Multimedia Retrieval Pub Date : 2011-04-18 DOI: 10.1145/1991996.1992003

Yadong Mu, Xiangyu Chen, Tat-Seng Chua, Shuicheng Yan

{"title":"Learning reconfigurable hashing for diverse semantics","authors":"Yadong Mu, Xiangyu Chen, Tat-Seng Chua, Shuicheng Yan","doi":"10.1145/1991996.1992003","DOIUrl":"https://doi.org/10.1145/1991996.1992003","url":null,"abstract":"In recent years, locality-sensitive hashing (LSH) has gained plenty of attention from both the multimedia and computer vision communities due to its empirical success and theoretic guarantee in large-scale visual indexing and retrieval. Conventional LSH algorithms are designated either for generic metrics such as Cosine similarity, ℓ2-norm and Jaccard index, or for the metrics learned from user-supplied supervision information. The common drawbacks of existing algorithms are their incapability to be adapted to metric changes, along with the inefficacy when handling diverse semantics (e. g., more than 1K different categories in the well-known ImageNet database). For the metrics underlying the hashing structure, even tiny changes tend to nullify previous indexing efforts, which motivates our proposed framework towards \"reconfigurable hashing\". The basic idea is to maintain a large pool of over-complete hashing functions embedded in the ambient feature space, which serves as the common infrastructure of high-level diverse semantics. At the runtime, the algorithm dynamically selects relevant hashing bits by maximizing the consistency to specific semantics-induced metric, thereby achieving reusability of the pre-computed hashing bits. Such a reusable scheme especially benefits the indexing and retrieval of large-scale dataset, since it facilitates one-off indexing rather than continuous computation-intensive maintenance towards metric adaptation. We propose a sequential bit-selection algorithm based on local consistency and global regularization. Extensive studies are conducted on large-scale image benchmarks to comparatively investigate the performance of different strategies on reconfigurable hashing. Despite the vast literature on hashing, to our best knowledge rare endeavors have been spent toward the reusability of hashing structures in large-scale datasets.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125227682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Person-specific age estimation under ranking framework 排名框架下的个人年龄估计

Proceedings of the 1st ACM International Conference on Multimedia Retrieval Pub Date : 2011-04-18 DOI: 10.1145/1991996.1992034

Yong Ma, T. Xiong, Y. Zou, Kongqiao Wang

引用次数: 18

A color-action perceptual approach to the classification of animated movies 用色彩-动作感知方法对动画电影进行分类

Proceedings of the 1st ACM International Conference on Multimedia Retrieval Pub Date : 2011-04-18 DOI: 10.1145/1991996.1992006

B. Ionescu, C. Vertan, P. Lambert, A. Benoît

引用次数: 8

Image modality classification: a late fusion method based on confidence indicator and closeness matrix 图像模态分类:基于置信度指标和接近度矩阵的后期融合方法

Proceedings of the 1st ACM International Conference on Multimedia Retrieval Pub Date : 2011-04-18 DOI: 10.1145/1991996.1992051

Xingzhi Sun, L. Gong, A. Natsev, Xiaofei Teng, Li-Ying Tian, Tao Wang, Yue Pan

引用次数: 3

Embracing semantics in zoomable user interface 在可缩放的用户界面中包含语义

Proceedings of the 1st ACM International Conference on Multimedia Retrieval Pub Date : 2011-04-18 DOI: 10.1145/1991996.1992058

Daniele Panza, A. Vitali, Alexandro Sentinelli, Luca Celetto

引用次数: 0

Diversity ranking for video retrieval from a broadcaster archive 从广播公司档案中检索视频的多样性排序

Proceedings of the 1st ACM International Conference on Multimedia Retrieval Pub Date : 2011-04-18 DOI: 10.1145/1991996.1992052

Xavier Giró-i-Nieto, Monica Alfaro, F. Marqués

引用次数: 6

Accurate content-based video copy detection with efficient feature indexing 准确的基于内容的视频复制检测与高效的特征索引

Proceedings of the 1st ACM International Conference on Multimedia Retrieval Pub Date : 2011-04-18 DOI: 10.1145/1991996.1992015

Yusuke Uchida, M. Agrawal, S. Sakazawa

{"title":"Accurate content-based video copy detection with efficient feature indexing","authors":"Yusuke Uchida, M. Agrawal, S. Sakazawa","doi":"10.1145/1991996.1992015","DOIUrl":"https://doi.org/10.1145/1991996.1992015","url":null,"abstract":"We describe an accurate content-based copy detection system that uses both local and global visual features to ensure robustness. Our system advances state-of-the-art techniques in four key directions. (1) Multiple-codebook-based product quantization: conventional product quantization methods encode feature vectors using a single codebook, resulting in large quantization error. We propose a novel codebook generation method for an arbitrary number of codebooks. (2) Handling of temporal burstiness: for a stationary scene, once a query feature matches incorrectly, the match continues in successive frames, resulting in a high false-alarm rate. We present a temporal-burstiness-aware scoring method that reduces the impact from similar features, thereby reducing false alarms. (3) Densely sampled SIFT descriptors: conventional global features suffer from a lack of distinctiveness and invariance to non-photometric transformations. Our densely sampled global SIFT features are more discriminative and robust against logo or pattern insertions. (4) Bigram- and multiple-assignment-based indexing for global features: we extract two SIFT descriptors from each location, which makes them more distinctive. To improve recall, we propose multiple assignments on both the query and reference sides. Performance evaluation on the TRECVID 2009 dataset indicates that both local and global approaches outperform conventional schemes. Furthermore, the integration of these two approaches achieves a three-fold reduction in the error rate when compared with the best performance reported in the TRECVID 2009 workshop.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"653 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122962054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13