{"title":"Steganalysis for LSB Matching in Images with High-frequency Noise","authors":"Jun Zhang, I. Cox, G. Doërr","doi":"10.1109/MMSP.2007.4412897","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412897","url":null,"abstract":"Considerable progress has been made in the detection of steganographic algorithms based on replacement of the least significant bit (LSB) plane. However, if LSB matching, also known as -1 embedding, is used, the detection rates are considerably reduced. In particular, since LSB embedding is modeled as an additive noise process, detection is especially poor for images that exhibit high-frequency noise - the high-frequency noise is often incorrectly thought to be indicative of a hidden message. To overcome this, we propose a targeted steganalysis algorithm that exploits the fact that after LSB matching, the local maxima of an images graylevel or color histogram decrease and the local minima increase. Consequently, the sum of the absolute differences between local extrema and their neighbors in the intensity histogram of stego images will be smaller than for cover images. Experimental results on two datasets, each of 2000 images, demonstrate that this method has superior results compared with other recently proposed algorithms when the images contain high-frequency noise, e.g. never-compressed imagery such as high-resolution scans of photographs and video. However, the method is inferior to the prior art when applied to decompressed imagery with little or no high-frequency noise.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128163380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Film Classifier Based on Low-level Visual Features","authors":"Hui-Yu Huang, W. Shih, W. Hsu","doi":"10.4304/jmm.3.3.26-33","DOIUrl":"https://doi.org/10.4304/jmm.3.3.26-33","url":null,"abstract":"In this paper, we propose an approach to categorize the film classes by using low-level features and visual features. The goal of this approach is to classify the films into genres. Our current domain of study is using the movie preview. A film preview often emphasizes the theme of a film and hence provides suitable information for classification process. In our approach, we classify films into three broad categories: action, dramas, and thriller films. Four computable video features (average shot length, color variance, motion content and lighting key) and visual effects are combined in our approach to provide the advantage information to demonstrate the movie category. Our approach can also be extended for other potential applications, including the browsing and retrieval of videos on the Internet, video-on-demand, and video libraries.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125450598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Suppression of Boundary Effect and Introduction of Scale Correlation for Wavelet based Traffic Prediction","authors":"Naoya Matsusue, H. Hasegawa, Ken-ichi Sato","doi":"10.1109/MMSP.2007.4412910","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412910","url":null,"abstract":"In this paper, we propose a Wavelet-based prediction method of the Internet traffic volume. By introducing maximal overlap formulation of the Haar wavelet, the proposed method is free from so-called boundary condition, which arises from processing delay of analysis filters of the wavelet transform and refrains from full utilization of information on recent input signals. The proposed method is based on a vector autoregressive model so as to introduce inter-scale correlation of Wavelet coefficient series at different scales.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121440160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Smart Transcoding between CELP Speech Codecs through Voiced Oriented Pitch Mapping","authors":"C. Beaugeant","doi":"10.1109/MMSP.2007.4412841","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412841","url":null,"abstract":"The deployment of incompatible standardized speech codecs implies interoperability issues between telecommunication networks. Transcoding from one codec format to another is necessary at gateways between networks in order to assure the interoperability. This transcoding reduces speech quality, involves computation load and additive delay. Recently, a fair amount of work has been conducted on studying alternative transcoding methods reducing complexity and delay. In this context this paper further elaborates on solutions based on pitch mapping between standardized CELP codecs. It presents an alternative solution to the already published ones by applying a pitch mapping driven by the voiced property of the frames of analysis. A better compromise between reduction of complexity and impact on the quality of the processed speech signal is thus achieved.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133348271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"State of the Art and Future Directions in Musical Sound Synthesis","authors":"Xavier Serra","doi":"10.1109/MMSP.2007.4412805","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412805","url":null,"abstract":"Sound synthesis and processing has been the most active research topic in the field of sound and music computing for more than 40 years. Quite a number of the early research results are now standard components of many audio and music devices and new technologies are continuously being developed and integrated into new products. Through the years there have been important changes. For example, most of the abstract algorithms that were the focus of work in the 70s and 80s are considered obsolete. Then the 1990s saw the emergence of computational approaches that aimed either at capturing the characteristics of a sound source, known as physical models, or at capturing the perceptual characteristics of the sound signal, generally referred to as spectral or signal models. More recent trends include the combination of physical and spectral models and the corpus-based concatenative methods. But the field faces major challenges that might revolutionize the standard paradigms and applications of sound synthesis. In this article, we will first place the sound synthesis topic within its research context, then we will highlight some of the current trends, and finally we will attempt to identify some challenges for the future.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115790782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exponential Decay of Transmission Distortion in H.264","authors":"S. Nyamweno, Ramdas Satyan, F. Labeau","doi":"10.1109/MMSP.2007.4412869","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412869","url":null,"abstract":"In this paper we investigate the impact of transmission errors in H.264. Transmission errors propagate into subsequent frames due to motion prediction and result in degraded video quality. Our simulations show that H.264 exhibits non-fading behaviour. We propose a method that introduces a fading characteristic and can eliminate the error propagation after a few frames. We provide a detailed analysis of our results based on a comparison with MPEG-4 and the residual energy per frame.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114572610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Ahammad, Chuohao Yeo, K. Ramchandran, S. Sastry
{"title":"Unsupervised Discovery of Action Hierarchies in Large Collections of Activity Videos","authors":"P. Ahammad, Chuohao Yeo, K. Ramchandran, S. Sastry","doi":"10.1109/MMSP.2007.4412903","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412903","url":null,"abstract":"Given a large collection of videos containing activities, we investigate the problem of organizing it in an unsupervised fashion into a hierarchy based on the similarity of actions embedded in the videos. We use spatio-temporal volumes of filtered motion vectors to compute appearance-invariant action similarity measures efficiently -and use these similarity measures in hierarchical agglomerative clustering to organize videos into a hierarchy such that neighboring nodes contain similar actions. This naturally leads to a simple automatic scheme for selecting videos of representative actions (exemplars) from the database and for efficiently indexing the whole database. We compute a performance metric on the hierarchical structure to evaluate goodness of the estimated hierarchy, and show that this metric has potential for predicting the clustering performance of various joining criteria used in building hierarchies. Our results show that perceptually meaningful hierarchies can be constructed based on action similarities with minimal user supervision, while providing favorable clustering performance and retrieval performance.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126151989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shay Har-Noy, Ò. Escoda, P. Yin, C. Gomila, Truong Q. Nguyen
{"title":"Adaptive In-Loop Prediction Refinement for Video Coding","authors":"Shay Har-Noy, Ò. Escoda, P. Yin, C. Gomila, Truong Q. Nguyen","doi":"10.1109/MMSP.2007.4412845","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412845","url":null,"abstract":"Modern video compression codecs achieve high compression efficiency by exploiting the temporal and spatial redundancies present in video sequences. Although state of the nit block based intra and inter prediction have been refined to better adapt to the content in the scene, they still exhibit limitations in fully extracting information from the decoded data. We propose the addition of an in-loop prediction refinement stage that is initialized with the conventional intra or inter predicted result and is capable of further reducing the spatial redundancies present in the sequence. By extracting additional information from the decoded neighborhood, the refinement stage is able to bring the prediction closer to the original data, thus improving compression efficiency. In this work, we study one possible method of refinement for the particular case of intra prediction. This uses sparse decomposition of blocks that overlap decoded neighboring blocks and the current predicted block in order to make prediction more coherent with already encoded data.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"161 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129103804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Sedaaghi, Constantine Kotropoulos, D. Ververidis
{"title":"Using Adaptive Genetic Algorithms to Improve Speech Emotion Recognition","authors":"M. Sedaaghi, Constantine Kotropoulos, D. Ververidis","doi":"10.1109/MMSP.2007.4412916","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412916","url":null,"abstract":"In this paper, adaptive genetic algorithms are employed to search for the worst performing features with respect to the probability of correct classification achieved by the Bayes classifier in a first stage. These features are subsequently excluded from sequential floating feature selection that employs the probability of correct classification of the Bayes classifier as criterion. In a second stage, adaptive genetic algorithms search for the worst performing utterances with respect to the same criterion. The sequential application of both stages is demonstrated to improve speech emotion recognition on the Danish Emotional Speech database.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125461101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Samuel Kim, P. Georgiou, Sungbok Lee, Shrikanth S. Narayanan
{"title":"Real-time Emotion Detection System using Speech: Multi-modal Fusion of Different Timescale Features","authors":"Samuel Kim, P. Georgiou, Sungbok Lee, Shrikanth S. Narayanan","doi":"10.1109/MMSP.2007.4412815","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412815","url":null,"abstract":"The goal of this work is to build a real-time emotion detection system which utilizes multi-modal fusion of different timescale features of speech. Conventional spectral and prosody features are used for intra-frame and supra-frame features respectively, and a new information fusion algorithm which takes care of the characteristics of each machine learning algorithm is introduced. In this framework, the proposed system can be associated with additional features, such as lexical or discourse information, in later steps. To verify the realtime system performance, binary decision tasks on angry and neutral emotion are performed using concatenated speech signal simulating realtime conditions.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126807074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}