Yu-Ming Liang, S. Shih, A. C. Shih, H. Liao, Cheng-Chung Lin
{"title":"A Language Modeling Approach to Atomic Human Action Recognition","authors":"Yu-Ming Liang, S. Shih, A. C. Shih, H. Liao, Cheng-Chung Lin","doi":"10.1109/MMSP.2007.4412874","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412874","url":null,"abstract":"Visual analysis of human behavior has generated considerable interest in the field of computer vision because it has a wide spectrum of potential applications. Atomic human action recognition is an important part of a human behavior analysis system. In this paper, we propose a language modeling framework for this task. The framework is comprised of two modules: a posture labeling module, and an atomic action learning and recognition module. A posture template selection algorithm is developed based on a modified shape context matching technique. The posture templates form a codebook that is used to convert input posture sequences into training symbol sequences or recognition symbol sequences. Finally, a variable-length Markov model technique is applied to learn and recognize the input symbol sequences of atomic actions. Experiments on real data demonstrate the efficacy of the proposed system.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121733150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distance Dependent Depth Filtering in 3D Warping for 3DTV","authors":"Ismaël Daribo, C. Tillier, B. Pesquet-Popescu","doi":"10.1109/MMSP.2007.4412880","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412880","url":null,"abstract":"Depth image-based rendering (DIBR) is the process of synthesizing some new \"virtual\" views from one \"real\" view and the associated per-pixel depth information. The most important problem in this process is to deal with the newly exposed areas (holes) appearing in the virtual images. One common solution to decrease the number of holes is to pre-process the depth map, before the warping. In this paper, we present a new filtering technique for depth image-based rendering. In order to reduce or completely remove the newly exposed areas an efficient smoothing is necessary for the sharp depth changes near object boundaries. In the meantime it is useless to filter the smooth areas in the depth map. Our solution is based on a weighted Gaussian filter taking into account the distance to the contours. By this way, the geometric distortions and the computation time are reduced compared to a uniform filtering of the depth map. We present some results in the context of creation of stereoscopic views for 3D TV.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127590072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatial and Temporal Data Parallelization of Multi-view Video Encoding Algorithm","authors":"Yi Pang, Lifeng Sun, Songliu Guo, Shiqiang Yang","doi":"10.1109/MMSP.2007.4412911","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412911","url":null,"abstract":"Multi-view video coding technology is proposed to resolve the problem of huge data storage and transmission for free-view and 3D interactive video. How to support real time multi-view video encoding which has high computing complexity with sharply increased multi-view video data is essential In this paper, we proposed a solution of spatial and temporal data parallelization for multi-view video encoding algorithm based on IBM cell multiprocessor system using selections of optimal theories & methods. The performance of our tasks distributing scheme is eight times faster than the serial algorithm, speedup is notable.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128149853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Konstantinos Rapantzikos, Georgios Evangelopoulos, P. Maragos, Yannis Avrithis
{"title":"An Audio-Visual Saliency Model for Movie Summarization","authors":"Konstantinos Rapantzikos, Georgios Evangelopoulos, P. Maragos, Yannis Avrithis","doi":"10.1109/MMSP.2007.4412882","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412882","url":null,"abstract":"A saliency-based method for generating video summaries is presented, which exploits coupled audiovisual information from both media streams. Efficient and advanced speech and image processing algorithms to detect key frames that are acoustically and visually salient are used. Promising results are shown from experiments on a movie database.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114742348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MINMAX Video Summarization under Equality Principle","authors":"C. Panagiotakis, I. Grinias, G. Tziritas","doi":"10.1109/MMSP.2007.4412870","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412870","url":null,"abstract":"In this paper we present a video summarization scheme. First, shot detection is performed and then we extract the key frames under an equality requirement on subshots. We propose a key frames selection algorithm (Iso-Content MINMAX), which is very flexible on any choice of content descriptors, and is based on MINMAX optimization formulation. The equality principle provides to the selected key frames the useful property to be equivalent on video content summarization.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"210 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123292887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward a 3D watermarking benchmark","authors":"J. Bennour, J. Dugelay","doi":"10.1109/MMSP.2007.4412893","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412893","url":null,"abstract":"In the last few years, a large number of 3D watermarking schemes have been proposed. We describe in this paper a possible benchmark to evaluate 3D watermarking algorithms. A list of objects and basic reproducible attacks against which 3D watermarking system could be evaluated are proposed as well as a way to compute a final score.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123429298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sensor Networks for Ambient Intelligence","authors":"E. Pauwels, A. A. Salah, R. Tavenard","doi":"10.1109/MMSP.2007.4412806","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412806","url":null,"abstract":"Due to rapid advances in networking and sensing technology we are witnessing a growing interest in sensor networks, in which a variety of sensors are connected to each other and to computational devices capable of multimodal signal processing and data analysis. Such networks are seen to play an increasingly important role as key enablers in emerging pervasive computing technologies. In the first part of this paper we give an overview of recent developments in the area of multimodal sensor networks, paying special attention to ambient intelligence applications. In the second part, we discuss how the time series generated by data streams emanating from the sensors can be mined for temporal patterns, indicating cross-sensor signal correlations.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124832987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Theodoros Giannakopoulos, A. Pikrakis, S. Theodoridis
{"title":"A Multi-Class Audio Classification Method With Respect To Violent Content In Movies Using Bayesian Networks","authors":"Theodoros Giannakopoulos, A. Pikrakis, S. Theodoridis","doi":"10.1109/MMSP.2007.4412825","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412825","url":null,"abstract":"In this work, we present a multi-class classification algorithm for audio segments recorded from movies, focusing on the detection of violent content, for protecting sensitive social groups (e.g. children). Towards this end, we have used twelve audio features stemming from the nature of the signals under study. In order to classify the audio segments into six classes (three of them violent), Bayesian networks have been used in combination with the one versus all classification architecture. The overall system has been trained and tested on a large data set (5000 audio segments), recorded from more than 30 movies of several genres. Experiments showed, that the proposed method can be used as an accurate multi-class classification scheme, but also, as a binary classifier for the problem of violent -non violent audio content.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122092128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A multi-view video coding approach using Layered Depth Image","authors":"Xiaoyu Cheng, Lifeng Sun, Shiqiang Yang","doi":"10.1109/MMSP.2007.4412838","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412838","url":null,"abstract":"Multi-view video introduces more forms of interactivity and can potentially be used for a variety of applications, such as free-view video (FVV) and three-dimensional video (3DTV). However, the huge volume video data and the synthesis of virtual view limit the practicality of multi-view video. In this paper, we propose an approach to multi-view video coding, in which the mutli-view video sequences are converted to layered depth images (LDIs) to represent scenes, and compressed layer by layer. To achieve better compression efficiency, we restructure layered images according depth data in LDIs. Experimental results show that our approach is practicable and efficient to meet the compression efficiency and flexible interactivity.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122656557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust perceptual hashing as classification problem: decision-theoretic and practical considerations","authors":"S. Voloshynovskiy, O. Koval, F. Beekhof, T. Pun","doi":"10.1109/MMSP.2007.4412887","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412887","url":null,"abstract":"In this paper we consider the problem of robust perceptual hashing as composite hypothesis testing. First, we formulate this problem as multiple hypothesis testing under prior ambiguity about source statistics and channel parameters representing a family of restricted geometric attacks. We introduce an efficient universal test that achieves the performance of informed decision rules for the specified class of source and geometric channel models. Finally, we consider the practical hash construction, which compromises computational complexity, robustness to geometrical transformations, lack of priors about source statistics and security requirements. The proposed hash is based on a binary hypothesis testing for randomly or semantically selected blocks or regions in sequences or images. We present the results of experimental validation of the developed concept that justifies the practical efficiency of the elaborated framework.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122823378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}