Massive-scale multimedia semantic modeling

Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI:10.1145/2502081.2502235

John R. Smith, Liangliang Cao

引用次数: 0

Abstract

Visual data is exploding! 500 billion consumer photos are taken each year world-wide, 633 million photos taken per year in NYC alone. 120 new video-hours are uploaded on YouTube per minute. The explosion of digital multimedia data is creating a valuable open source for insights. However, the unconstrained nature of 'image/video in the wild' makes it very challenging for automated computer-based analysis. Furthermore, the most interesting content in the multimedia files is often complex in nature reflecting a diversity of human behaviors, scenes, activities and events. To address these challenges, this tutorial will provide a unified overview of the two emerging techniques: Semantic modeling and Massive scale visual recognition, with a goal of both introducing people from different backgrounds to this exciting field and reviewing state of the art research in the new computational era.

查看原文本刊更多论文

大规模多媒体语义建模

视觉数据正在爆炸!全球每年拍摄5000亿张消费者照片，仅纽约市每年就拍摄6.33亿张。每分钟有120个新视频小时上传到YouTube上。数字多媒体数据的爆炸式增长正在为见解创造一个有价值的开放资源。然而，“野外图像/视频”的不受约束性质使得基于计算机的自动化分析非常具有挑战性。此外，多媒体文件中最有趣的内容在本质上往往是复杂的，反映了人类行为、场景、活动和事件的多样性。为了应对这些挑战，本教程将提供两种新兴技术的统一概述:语义建模和大规模视觉识别，目的是将来自不同背景的人们介绍到这个令人兴奋的领域，并回顾新计算时代的艺术研究状态。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 21st ACM international conference on Multimedia

自引率

0.00%

发文量