{"title":"Syntactic matching of pedestrian trajectories for behavioral analysis","authors":"Nicola Piotto, N. Conci, F. D. Natale","doi":"10.1109/MMSP.2008.4665197","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665197","url":null,"abstract":"In the present work we propose a new approach to dynamically characterize trajectories for a syntactic spatio-temporal alignment that can be applied in the context of behavioral analysis and anomalous activity detection. The developed architecture is based on a symbolic representation of the trajectory, exploiting the framework of the so-called edit-distance. The acquired trajectory samples are filtered to identify the most significant spatio-temporal discontinuities: these key points are converted into a string-based domain where the matching of trajectory pairs can be expressed in terms of global alignment between symbols, similarly to DNA string matching algorithms. The extraction, characterization and alignment of trajectories have been tested in different environments, demonstrating the reliability of the achieved results and the viability of the solution for video surveillance and domotics applications.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"320 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132334999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Borchert, R. Westerlaken, R. K. Gunnewiek, R. Lagendijk
{"title":"Motion compensated prediction in transform domain Distributed Video Coding","authors":"S. Borchert, R. Westerlaken, R. K. Gunnewiek, R. Lagendijk","doi":"10.1109/MMSP.2008.4665099","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665099","url":null,"abstract":"The ongoing research in distributed video coding (DVC) for low complexity encoding is trying to shorten the substantial performance gap to well known state-of-the-art coders. One way of reducing this gap is to improve the quality of the motion compensated prediction. In this paper we investigate which motion estimation method to apply in DVC, comparing possible methods to produce a motion compensated prediction. We use interpolation as well as extrapolation methods. Our results show that even with a very simple DCT scheme for the Wyner Ziv frames, extrapolation can outperform the widely used interpolation.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130875050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Digital camera identification based on canonical correlation analysis","authors":"Chi Zhang, Hongbin Zhang","doi":"10.1109/MMSP.2008.4665178","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665178","url":null,"abstract":"In this paper, we introduce a new method for digital camera identification from its color images using image sensor noise. We first compute the two noise reference patterns by averaging the noise component from two groups of color images taken with a camera. Then we use canonical correlation analysis (CCA) to calculate the projection directions of the two noise reference patterns. Finally, we calculate the correlation coefficient between the projection of the noise from a specific color image onto one projection direction and the projection of one of noise reference patterns onto another projection direction, then use this coefficient to decide whether the specific color image was taken by the camera or not. Experimental results show that the presented method provides higher accuracy than other methods on the condition of using a few images to compute reference pattern.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126888443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Coding structure optimization for interactive multiview streaming in virtual world observation","authors":"Gene Cheung, Antonio Ortega, Takashi Sakamoto","doi":"10.1109/MMSP.2008.4665121","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665121","url":null,"abstract":"While most multiview coding techniques focus on compressing all frames in a multiview video sequence in a rate-distortion optimal manner, in this paper we address the problem of interactive multiview streaming, where we minimize the expected transmission rate of an interactive multiview video stream, where the observer can select the view of the next frame, subject to a storage constraint. We show that gains can be achieved by optimizing the trade-off between overall storage and transmission rate, i.e., by storing a more redundant multiview representation (where some frames are encoded more than once, each time using a different reference frame) it is possible to reduce the overall bandwidth needed for online interactive viewing. We show that our proposed redundant representation can reduce the transmission cost of interactive multiview streaming by up to 65% as compared to a good non-redundant representation for the same storage constraint.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122311132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Facial color adaptive technique based on the theory of emotion-color association and analysis of animation","authors":"Kyu-ho Park, Taeyong Kim","doi":"10.1109/MMSP.2008.4665194","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665194","url":null,"abstract":"Graphical expressions and artificial intelligence-driven (AI) game characters have been continuously improving, spurred by the astonishing growth of the game technology industry. Despite such improvements, users are still demanding a more natural gaming environment and reflections of true human emotions. However, the emotions that can currently be expressed are strictly limited because the facial colors and expressions of present game characters are hardly noticeable. Such restrictions can prevent the users from getting fully absorbed in the game. To address this, we developed the facial color change technique, which is a combination emotional model based on human cultural theory, emotional expression pattern using colors, and emotional reaction speed function, as opposed to past methods that expressed emotion through blood flow, pulse, or skin temperature. The reflection of the game characterpsilas emotion on itpsilas skin color will increase user immersion into the game and enrich the playerpsilas experience.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115971794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Error resilient transcoding of Scalable Video bitstreams","authors":"Yi Guo, Houqiang Li, Ye-Kui Wang, C. Chen","doi":"10.1109/MMSP.2008.4665094","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665094","url":null,"abstract":"We propose in this paper a novel error resilient transcoding scheme that can be placed at the boundary between wired and wireless networks via heterogeneous network links. This error resilient transcoder shall seamlessly complement the standard scalable video coding (SVC) bitstream to offer additional error resilient adaptation capability for receiving devices. The novel error resilient transcoding scheme consists of three different modules; each is designed to meet various levels of complexity need. The three modules are all based on the loss-aware rate-distortion optimization (LA-RDO) mode decision algorithm we have previously developed for SVC. However, each individual module can be tailored to different complexity requirements depending on whether and how the LA-RDO mode decision is implemented. Another innovation of this approach is the design of a fast rate control algorithm in order to maintain consistent bitrates between input and output of the transcoder. This rate control algorithm only needs picture-level bit information for training target quantization parameters. Simulation results demonstrate that, comparing with standard SVC, the proposed approach is able to achieve up to 4 dB gain for the enhancement layer video and up to 1 dB gain for the base layer video.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116639618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Music emotion annotation by machine learning","authors":"W. Cheung, Guojun Lu","doi":"10.1109/MMSP.2008.4665144","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665144","url":null,"abstract":"Music emotion annotation is a task of attaching emotional terms to musical works. As volume of online musical contents expands rapidly in recent years, demands for retrieval by emotion are emerging. Currently, literature on music retrieval using emotional terms is rare. Emotion annotated data are scarce in existing music databases because annotation is still a manual task. Automating music emotion annotation is an essential prerequisite to research in music retrieval by emotion, for without which even sophisticated retrieval methods may not be very useful in a data deficient environment. This paper describes a machine learning approach to annotate music using a large number of emotional terms. We also estimate the training data size requirements for a workable annotation system. Our empirical result shows that 1) the task of music emotion annotation could be modelled using machine learning techniques to support a large number of emotional terms, 2) the combination of sampling method and data-driven detection threshold is highly effective in optimizing the use of existing annotated data in training machine learning models, 3) synonymous relationships enhance the annotation performance and 4) the training data size requirement is within reach for a workable annotation system. Essentially, automatic music emotion annotation enables music retrieval by emotion to be performed as a text retrieval task.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116771937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Capturing high dynamic range images with partial re-exposures","authors":"B. Guthier, S. Kopf, W. Effelsberg","doi":"10.1109/MMSP.2008.4665082","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665082","url":null,"abstract":"In this paper we present an optimized approach to capture high dynamic range (HDR) images. It is based on existing methods of creating HDR images by fusing a set of differently exposed low dynamic range (LDR) images. We optimize the capturing process of LDR images towards improved capture speed by using partial re-exposures. That is, we make use of the idea that it is not always necessary to capture full size images when only small portions of the scene require HDR. By analyzing captured images for badly exposed regions and re-exposing selectively, we save overall capture time and increase the frame rate when image sequences are recorded.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131276429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining temporal information and web-casting text for automatic sports event detection","authors":"Minh-Son Dao, N. Babaguchi","doi":"10.1109/MMSP.2008.4665150","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665150","url":null,"abstract":"In this paper, the generic framework for automatically detecting event based on Allen temporal algebra and external text information support is presented. The motivation of the proposed method is (1) to relax the need of domain knowledge that requires significant human interference; and (2) to take into account the temporal information that has been paid less attention though it is critical to convey event meaning. In order to solve two these problems, in the proposed method, the temporal information is captured by presenting events as the temporal sequences using a lexicon of Allen-based non-ambiguous temporal patterns. These sequences are then used to mine temporal patterns with web-casting text supports by using technique of mining class association rules. Then, the results of previous steps are tailored to build the event detector. Thorough experimental results and comparisons that are carried on more than 30 hours of soccer video corpus captured at different broadcasters and conditions demonstrates that the proposed method meets two aforementioned motivations with high efficiency, effectiveness, and robustness.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"3 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131436955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast video object segmentation using Markov random field","authors":"C. Mak, W. Cham","doi":"10.1109/MMSP.2008.4665101","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665101","url":null,"abstract":"A fast video object segmentation algorithm is proposed in this paper. The algorithm utilizes the motion vectors from blocks with variable block sizes to identify background motion model and moving objects. Markov random field is used to model the foreground field to enhance spatial and temporal continuity of objects. To speed up the segmentation time, time-consuming spatial segmentation techniques are avoided. Instead, spatial information in the form of Walsh Hadamard transform coefficients is utilized to improve segmentation accuracy. Experimental results show that the proposed algorithm can effectively extract moving objects from different kind of video sequences. The computation time of the segmentation process is merely about 75 ms per CIF frame using a normal PC, allowing the algorithm to be applied in real-time applications such as video surveillance and conferencing.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133909142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}