Shen-Chi Chen, Chia-Hsiang Wu, Shih-Yao Lin, Y. Hung
{"title":"2D Face Alignment and Pose Estimation Based on 3D Facial Models","authors":"Shen-Chi Chen, Chia-Hsiang Wu, Shih-Yao Lin, Y. Hung","doi":"10.1109/ICME.2012.60","DOIUrl":"https://doi.org/10.1109/ICME.2012.60","url":null,"abstract":"Face alignment and head pose estimation has become a thriving research field with various applications for the past decade. Several approaches process on 2D texture image but most of them perform decently only with small pose variation. Recently, many approaches apply depth information to align objects. However, applications are restricted because depth cameras are more expensive than common cameras, and many original image resources contain no depth information. Therefore, we propose a 3D face alignment algorithm in 2D image based on Active Shape Model, and use Speeded-Up Robust Features (SURF) descriptors as local texture model. We train a 3D shape model with different view-based local texture models from a 3D database, and then fit a face in a 2D image by these models. We also improve the performance by two-stage search strategy. Furthermore, the head pose can be estimated by the alignment result of the proposed 3D model. Finally, we demonstrate some applications applied by our method.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134129215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal Bit-allocation for Wavelet-based Scalable Video Coding","authors":"Guan-Ju Peng, W. Hwang, Sao-Jie Chen","doi":"10.1109/ICME.2012.146","DOIUrl":"https://doi.org/10.1109/ICME.2012.146","url":null,"abstract":"We investigate the wavelet-based scalable video coding problem and present a solution that takes account of each user's preferred resolution. Based on the preference, we formulate the bit allocation problem of wavelet-based scalable video coding. We propose three methods to solve the problem. The first is an efficient Lagrangian-based method that solves the upper bound of the problem optimally, and the second is a less efficient dynamic programming method that solves the problem optimally. Both methods require knowledge of the user's preference. For the case where the user's preference is unknown, we solve the problem by a min-max approach. Our objective is to find the bit allocation solution that maximizes the worst possible performance. We show that the worst performance occurs when all users subscribe to the same spatial, temporal, and quality resolutions. Thus, the min-max solution is exactly the same as the traditional bit allocation method for a non-scalable wavelet codec. We conduct several experiments on the 2D+t MCTF-EZBC wavelet codec with respect to various subscribers' preferences. The results demonstrate that knowing the users' preferences improves the coding performance of the scalable video codec significantly.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"224 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134289394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ontological Inference Framework with Joint Ontology Construction and Learning for Image Understanding","authors":"S. Tsai, Hao Tang, Feng Tang, Thomas S. Huang","doi":"10.1109/ICME.2012.145","DOIUrl":"https://doi.org/10.1109/ICME.2012.145","url":null,"abstract":"Lack of human prior knowledge is one of the main reasons that semantic gap still remains when it comes to automatic multimedia understanding. In this work, we exploit the ontological structure of target concepts and propose an universal ontological inference framework for image understanding. The framework explicitly utilizes subclass and co-occurrence relation to effectively refine the coarse concept detections. Moreover, we show how to automatically construct and learn the underlying ontology required by the framework. As can be shown by experiments, the result is an effective and robust algorithm that characterizes well the structure of the target concepts and outperforms the state-of-the-art methods.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133036593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Single Image Super-Resolution via Graph Embedding","authors":"Junjun Jiang, R. Hu, Zhen Han, Kebin Huang, T. Lu","doi":"10.1109/ICME.2012.102","DOIUrl":"https://doi.org/10.1109/ICME.2012.102","url":null,"abstract":"We explore in this paper efficient algorithmic solutions to single image super-resolution (SR). We propose the GESR, namely Graph Embedding Super-Resolution, to super-resolve a high-resolution (HR) image from a single low-resolution (LR) observation. The basic idea of GESR is to learn a projection matrix mapping the LR image patch to the HR image patch space while preserving the intrinsic geometrical structure of original HR image patch manifold. While GESR resembles other manifold learning-based SR methods in persevering the local geometric structure of HR and LR image patch manifold, the innovation of GESR lies in that it preserves the intrinsic geometrical structure of original HR image patch manifold rather than LR image patch manifold, which may be contaminated because of image degeneration (e.g., blurring, down-sampling and noise). Experiments on benchmark test images show that GESR can achieve very competitive performance as Neighbor Embedding based SR (NESR) and Sparse representation based SR (SSR). Beyond subjective and objective evaluation, all experiments show that GESR is much faster than both NESR and SSR.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114482975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video Gaze Prediction: Minimizing Perceptual Information Loss","authors":"Junyong You","doi":"10.1109/ICME.2012.191","DOIUrl":"https://doi.org/10.1109/ICME.2012.191","url":null,"abstract":"Automatic detection of visually interesting regions and gaze points plays an important role in many video applications. Due to limited ability of the human visual system (HVS) when processing visual stimuli at any instant, a natural function of gaze changes is to collect as much information as possible to form an accurate understanding of the visual scene. This paper proposes an automatic gaze prediction algorithm by modeling such function. An improved foveal imaging model is developed by taking visual attention and temporal visual characteristics into account. Gaze changes are predicted based on minimizing perceptual information loss due to the foveated vision mechanism. Experimental results against a video eye-tracking database demonstrate a promising performance of the proposed gaze prediction algorithm.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123457920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Hierarchical Model for Human Interaction Recognition","authors":"Yu Kong, Yunde Jia","doi":"10.1109/ICME.2012.67","DOIUrl":"https://doi.org/10.1109/ICME.2012.67","url":null,"abstract":"Recognizing human interactions is a challenging task due to partially occluded body parts and motion ambiguities in interactions. We observe that the interdependencies existing at both action level and body part level greatly help disambiguate similar individual movements and facilitate human interaction recognition. In this paper, we propose a novel hierarchical model to capture such interdependencies for recognizing interactions of two persons. We model the action of each person by a large-scale global feature and several body part features. Two types of contextual information are exploited in our model to capture the implicit and complex interdependencies between interaction class, the action classes of two persons and the labels of persons' body parts. We build a challenging human interaction dataset to test our method. Results show that our model is quite effective in recognizing human interactions.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123659835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Framework for 3D Computer Animation Systems for Nonprofessional Users Using an Automatic Rigging Algorithm","authors":"N. Pantuwong, Masanori Sugimoto","doi":"10.1109/ICME.2012.70","DOIUrl":"https://doi.org/10.1109/ICME.2012.70","url":null,"abstract":"This paper presents a novel framework for developing automatic animation systems, which accept a 3D model that is created at runtime. Previous systems cannot deal with such a 3D model because it needs to be prepared by a manual process (rigging) that may not be suitable for nonprofessional users. The proposed framework solves this problem by employing an automatic rigging algorithm. Our algorithm can generate an animation skeleton for a given 3D model automatically, including the anatomical meaning of each joint. The relationship between motion data and this animation skeleton is created by identifying the corresponding joints in the motion data and the skeleton. The motion data for each joint is transferred automatically to the 3D model that has been rigged via our automatic rigging algorithm. Because all processes can be completed without any user intervention, an animation system for nonprofessional users is therefore available. We also discuss several motion editing techniques that can be used to generate new motion data without complex processing.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122151859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chun-Chieh Hsu, Hua-Tsung Chen, Chien-Li Chou, Suh-Yin Lee
{"title":"Spiking and Blocking Events Detection and Analysis in Volleyball Videos","authors":"Chun-Chieh Hsu, Hua-Tsung Chen, Chien-Li Chou, Suh-Yin Lee","doi":"10.1109/ICME.2012.174","DOIUrl":"https://doi.org/10.1109/ICME.2012.174","url":null,"abstract":"In volleyball matches, spiking is the most effective way to gain points, while blocking is the action to prevent the opponents from getting scores by spiking. In this paper, we propose an intelligent system for automatic spiking events detection and blocking pattern classification in real volleyball videos. First, the entire videos are segmented into clips of rallies by whistle detection. Then, we find the court region based on proper camera calibration, and detect the location of the net for judging the positions of spiking and blocking. Via analyzing the changes of moving pixels along the net, we make a bounding box around the blocking location, so as to classify the blocking patterns into two main categories based on the width of bounding box. Finally, two important tactic patterns, delayed spiking and alternate position spiking, are recognized. With the information of spiking events and blocking locations, we can collect the statistical data and make tactics inference easily. To the best of our knowledge, no previous work is focused on spiking or blocking event detection. The experimental results on the videos recorded by a university volleyball team are promising and demonstrate the effectiveness of our proposed scheme.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129710285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Topology Preserved Regular Superpixel","authors":"Dai Tang, H. Fu, Xiaochun Cao","doi":"10.1109/ICME.2012.184","DOIUrl":"https://doi.org/10.1109/ICME.2012.184","url":null,"abstract":"Most existing super pixel algorithms ignore the topology and regularities, which results in undesirable sizes and location relationships for subsequent processing. In this paper, we introduce a new method to compute the regular super pixels while preserving the topology. Start from regular seeds, our method relocates them to the pixel with locally maximal edge magnitudes. Then, we find the local optimal path between each relocated seed and its four neighbors using Dijkstra algorithm. Thanks to the local constraints, our method obtains homogeneous super pixels with explicit adjacency in low-texture and uniform regions and, simultaneously, maintains the edge cues in the high contrast and salient contents. Quantitative and qualitative experimental results on Berkeley Segmentation Database Benchmark demonstrate that our proposed algorithm outperforms the existing regular super pixel methods.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124873516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Johann Poignant, L. Besacier, G. Quénot, F. Thollard
{"title":"From Text Detection in Videos to Person Identification","authors":"Johann Poignant, L. Besacier, G. Quénot, F. Thollard","doi":"10.1109/ICME.2012.119","DOIUrl":"https://doi.org/10.1109/ICME.2012.119","url":null,"abstract":"We present in this article a video OCR system that detects and recognizes overlaid texts in video as well as its application to person identification in video documents. We proceed in several steps. First, text detection and temporal tracking are performed. After adaptation of images to a standard OCR system, a final post-processing combines multiple transcriptions of the same text box. The semi-supervised adaptation of this system to a particular video type (video broadcast from a French TV) is proposed and evaluated. The system is efficient as it runs 3 times faster than real time (including the OCR step) on a desktop Linux box. Both text detection and recognition are evaluated individually and through a person recognition task where it is shown that the combination of OCR and audio (speaker) information can greatly improve the performances of a state of the art audio based person identification system.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"1988 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125491116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}