2012 IEEE International Conference on Multimedia and Expo最新文献_第2页

Efficient Tag Mining via Mixture Modeling for Real-Time Search-Based Image Annotation 基于混合建模的高效标签挖掘基于实时搜索的图像标注

2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.104

Lican Dai, Xin-Jing Wang, Lei Zhang, Nenghai Yu

{"title":"Efficient Tag Mining via Mixture Modeling for Real-Time Search-Based Image Annotation","authors":"Lican Dai, Xin-Jing Wang, Lei Zhang, Nenghai Yu","doi":"10.1109/ICME.2012.104","DOIUrl":"https://doi.org/10.1109/ICME.2012.104","url":null,"abstract":"Although it has been extensively studied for many years, automatic image annotation is still a challenging problem. Recently, data-driven approaches have demonstrated their great success to image auto-annotation. Such approaches leverage abundant partially annotated web images to annotate an uncaptioned image. Specifically, they first retrieve a group of visually closely similar images given an uncaptioned image as a query, then figure out meaningful phrases from the surrounding texts of the image search results. Since the surrounding texts are generally noisy, how to effectively mine meaningful phrases is crucial for the success of such approaches. We propose a mixture modeling approach which assumes that a tag is generated from a convex combination of topics. Different from a typical topic modeling approach like LDA, topics in our approach are explicitly learnt from a definitive catalog of the Web, i.e. the Open Directory Project (ODP). Compared with previous works, it has two advantages: Firstly, it uses an open vocabulary rather than a limited one defined by a training set. Secondly, it is efficient for real-time annotation. Experimental results conducted on two billion web images show the efficiency and effectiveness of the proposed approach.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127638159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Evaluating Gaussian Like Image Representations over Local Features 评估局部特征上的类高斯图像表示

2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.23

Yu-Chuan Su, Guan-Long Wu, Tzu-Hsuan Chiu, Winston H. Hsu, Kuo-Wei Chang

{"title":"Evaluating Gaussian Like Image Representations over Local Features","authors":"Yu-Chuan Su, Guan-Long Wu, Tzu-Hsuan Chiu, Winston H. Hsu, Kuo-Wei Chang","doi":"10.1109/ICME.2012.23","DOIUrl":"https://doi.org/10.1109/ICME.2012.23","url":null,"abstract":"Recently, several Gaussian like image representations are proposed as an alternative of the bag-of-word representation over local features. These representations are proposed to overcome the quantization error problem faced in bag-of-word representation. They are shown to be effective in different applications, the Extended Hierarchical Gaussianization reached excellent performance using single feature in VOC2009, Vector of Locally Aggregated Descriptors and Fisher Kernel reached excellent performance using only signature like representation on Holiday dataset. Despite their success and similarity, no comparative study about these representations has been made. In this paper, we perform a systematic comparison about three emerging different gaussian like representations: Extended Hierarchical Gaussianization, Fisher Kernel and Vector of Locally Aggregated Descriptors. We evaluate the performance and the influence of feature and parameters of these representations on Holiday and CC_Web_Video datasets, and several important properties about these representations have been observed during our investigation. This study provides better understanding about these gaussian like image representations that are believed to be promising in various applications.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126264007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Learning Semantic Motion Patterns for Dynamic Scenes by Improved Sparse Topical Coding 基于改进稀疏局部编码的动态场景语义运动模式学习

2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.133

Wei Fu, Jinqiao Wang, Zechao Li, Hanqing Lu, Songde Ma

{"title":"Learning Semantic Motion Patterns for Dynamic Scenes by Improved Sparse Topical Coding","authors":"Wei Fu, Jinqiao Wang, Zechao Li, Hanqing Lu, Songde Ma","doi":"10.1109/ICME.2012.133","DOIUrl":"https://doi.org/10.1109/ICME.2012.133","url":null,"abstract":"With the proliferation of cameras in public areas, it becomes increasingly desirable to develop fully automated surveillance and monitoring systems. In this paper, we propose a novel unsupervised approach to automatically explore motion patterns occurring in dynamic scenes under an improved sparse topical coding (STC) framework. Given an input video with a fixed camera, we first segment the whole video into a sequence of clips (documents) without overlapping. Optical flow features are extracted from each pair of consecutive frames, and quantized into discrete visual words. Then the video is represented by a word-document hierarchical topic model through a generative process. Finally, an improved sparse topical coding approach is proposed for model learning. The semantic motion patterns (latent topics) are learned automatically and each video clip is represented as a weighted summation of these patterns with only a few nonzero coefficients. The proposed approach is purely data-driven and scene independent (not an object-class specific), which make it suitable for very large range of scenarios. Experiments demonstrate that our approach outperforms the state-of-the art technologies in dynamic scene analysis.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130137220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Predicting Image Popularity in an Incomplete Social Media Community by a Weighted Bi-partite Graph 基于加权二部图的不完全社交媒体社区图像人气预测

2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.43

Xiang Niu, Lusong Li, Tao Mei, Jialie Shen, Ke Xu

{"title":"Predicting Image Popularity in an Incomplete Social Media Community by a Weighted Bi-partite Graph","authors":"Xiang Niu, Lusong Li, Tao Mei, Jialie Shen, Ke Xu","doi":"10.1109/ICME.2012.43","DOIUrl":"https://doi.org/10.1109/ICME.2012.43","url":null,"abstract":"Popularity prediction is a key problem in networks to analyze the information diffusion, especially in social media communities. Recently, there have been some custom-build prediction models in Digg and YouTube. However, these models are hardly transplant to an incomplete social network site (e.g., Flickr) by their unique parameters. In addition, because of the large scale of the network in Flickr, it is difficult to get all of the photos and the whole network. Thus, we are seeking for a method which can be used in such incomplete network. Inspired by a collaborative filtering method-Network-based Inference (NBI), we devise a weighted bipartite graph with undetected users and items to represent the resource allocation process in an incomplete network. Instead of image analysis, we propose a modified interdisciplinary models, called Incomplete Network-based Inference (INI). Using the data from 30 months in Flickr, we show the proposed INI is able to increase prediction accuracy by over 58.1%, compared with traditional NBI. We apply our proposed INI approach to personalized advertising application and show that it is more attractive than traditional Flickr advertising.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134052487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Fast Near-Duplicate Video Retrieval via Motion Time Series Matching 基于运动时间序列匹配的快速近重复视频检索

2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.111

John R. Zhang, J. Ren, Fangzhe Chang, Thomas L. Wood, J. Kender

引用次数: 18

Principal Components Analysis-Based Edge-Directed Image Interpolation 基于主成分分析的边缘图像插值

2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.153

Bing Yang, Zhiyong Gao, Xiaoyun Zhang

引用次数: 12

Error Modeling and Estimation Fusion for Indoor Localization 室内定位误差建模与估计融合

2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.106

Weipeng Zhuo, Bo Zhang, S. Chan, E. Chang

{"title":"Error Modeling and Estimation Fusion for Indoor Localization","authors":"Weipeng Zhuo, Bo Zhang, S. Chan, E. Chang","doi":"10.1109/ICME.2012.106","DOIUrl":"https://doi.org/10.1109/ICME.2012.106","url":null,"abstract":"There has been much interest in offering multimedia location-based service (LBS) to indoor users (e.g., sending video/audio streams according to user locations). Offering good LBS largely depends on accurate indoor localization of mobile stations (MSs). To achieve that, in this paper we first model and analyze the error characteristics of important indoor localization schemes, using Radio Frequency Identification (RFID) and Wi-Fi. Our models are simple to use, capturing important system parameters and measurement noises, and quantifying how they affect the accuracies of the localization. Given that there have been many indoor localization techniques deployed, an MS may receive simultaneously multiple co-existing estimations on its location. Equipped with the understanding of location errors, we then investigate how to optimally combine, or fuse, all the co-existing estimations of an MS's location. We present computationally-efficient closed-form expressions to fuse the outputs of the estimators. Simulation and experimental results show that our fusion technique achieves higher location accuracy in spite of location errors in the estimators.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114446645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Traffic Reduction for Multiple Users in Multi-view Video Streaming 多视频流中多用户的流量减少

2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.185

T. Fujihashi, Ziyuan Pan, Takashi Watanabe

{"title":"Traffic Reduction for Multiple Users in Multi-view Video Streaming","authors":"T. Fujihashi, Ziyuan Pan, Takashi Watanabe","doi":"10.1109/ICME.2012.185","DOIUrl":"https://doi.org/10.1109/ICME.2012.185","url":null,"abstract":"Multi-view video consists of multiple video sequences captured simultaneously from different angles by multiple closely spaced cameras. It enables the users to freely change their viewpoints by playing different video sequences. Transmission of multi-view video requires more bandwidth than conventional multimedia. To reduce the bandwidth, UDMVT (User Dependent Multi-view Video Transmission) based on MVC (Multi-view Video Coding) has been proposed for single user. In UDMVT, for multiple users the same frames are encoded into different versions for each user, which increases the redundant transmission. For this problem, this paper proposes UMSM (User dependent Multi-view video Streaming for Multi-users). UMSM possesses two characteristics. The first characteristic is that the overlapped frames that are required by multiple users are transmitted only once using the multicast to avoid unnecessary duplication of transmission. The second characteristic is that a time lag of the video request by multiple users is adjusted to coincide with the next request. Simulation results using benchmark test sequences provided by MERL show that UMSM decreases the transmission bit-rate 55.3% on average for 5 users watching the same multi-view video as compared with UDMVT.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114505067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Energy-Aware Operation of Black Box Surveillance Cameras under Event Uncertainty and Memory Constraint 事件不确定性和内存约束下黑匣子监控摄像机的能量感知运行

2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.21

Giwon Kim, Jungsoo Kim, Jongpil Jung, C. Kyung

引用次数: 5

A Unified Estimation-Theoretic Framework for Error-Resilient Scalable Video Coding 一种容错可伸缩视频编码的统一估计理论框架

2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.76

Jingning Han, Vinay Melkote, K. Rose

{"title":"A Unified Estimation-Theoretic Framework for Error-Resilient Scalable Video Coding","authors":"Jingning Han, Vinay Melkote, K. Rose","doi":"10.1109/ICME.2012.76","DOIUrl":"https://doi.org/10.1109/ICME.2012.76","url":null,"abstract":"A novel scalable video coding (SVC) scheme is proposed for video transmission over loss networks, which builds on an estimation-theoretic (ET) framework for optimal prediction and error concealment, given all available information from both the current base layer and prior enhancement layer frames. It incorporates a recursive end-to-end distortion estimation technique, namely, the spectral coefficient-wise optimal recursive estimate (SCORE), which accounts for all ET operations and tracks the first and second moments of decoder reconstructed transform coefficients. The overall framework enables optimization of ET-SVC systems for transmission over lossy networks, while accounting for all relevant conditions including the effects of quantization, channel loss, concealment, and error propagation. It thus resolves longstanding difficulties in combining truly optimal prediction and concealment with optimal end-to-end distortion and error-resilient SVC coding decisions. Experiments demonstrate that the proposed scheme offers substantial performance gains over existing error-resilient SVC systems, under a wide range of packet loss and bit rates.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116981544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1