{"title":"Recovery of lost VQ indexes in packet transmission","authors":"Zhe Wang, Xiaolin Wu","doi":"10.1109/MMSP.2002.1203249","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203249","url":null,"abstract":"We consider the problem of robust transmission of VQ-coded image/video via noisy packet networks. In the event of packet loss, some VQ index bits will be absent at the receiver side. But the very knowledge of lost packets identities the spatial locations of affected VQ blocks, which are powerful information for the decoder to estimate the missing VQ index bits. This is possible because of the statistical redundancy between spatially adjacent VQ blocks. In this paper we present a MAP (maximum a posterior) estimation technique to recover missing VQ index bits due to packet loss. The novelty of this work is to couple high-order Markov modeling with MAP while avoiding the problem of context dilution.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133810121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"R-D analysis of adaptive edge representations","authors":"R. M. F. I. Ventura, L. Granai, P. Vandergheynst","doi":"10.1109/MMSP.2002.1203265","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203265","url":null,"abstract":"This paper presents a rate-distortion analysis for a simple horizon edge image model. A quadtree with anisotropy and rotation is performed on this kind of image, giving a toy model for a non-linear adaptive coding technique, and its rate-distortion behavior is studied. The effect of refining the quadtree decomposition is also analyzed.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127896146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Watermarking of compressed multimedia using error-resilient VLCs","authors":"B. Mobasseri, Domenick Cinalli","doi":"10.1109/MMSP.2002.1203310","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203310","url":null,"abstract":"Error-resilient variable length codes (VLCs) have been proposed to counter bit errors over error-prone channels. In this work, we establish a linkage between channel coding and watermarking by observing that watermark bits, are, in effect, intentional bit errors. Using a recently introduced resynchronizing VLC, we have developed a compressed-domain watermarking algorithm where the inherent error-resilient property of the code is exploited to implement lossless, oblivious watermarking. The algorithm is implemented on MPEG-2 video.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115619001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Audio fingerprinting based on analyzing time-frequency localization of signals","authors":"Chun-Shien Lu","doi":"10.1109/MMSP.2002.1203275","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203275","url":null,"abstract":"Due to the desired non-invasive property, fingerprinting is considered to be an alternative to achieve many applications previously accomplished with watermarking. Some techniques for audio identification or retrieval have been proposed in the literature. However, few of them were done by analyzing the time-frequency variations based on a transformation with efficient time-scale localization. In this paper, we shall investigate the characterization and recognition of audio based on time-frequency analysis of signals. One dimensional continuous wavelet transform will be adopted to capture the time-frequency variations of audio. Based on the multiresolution structure of an audio, two fingerprints are created for authentication and recognition purposes, respectively. Experimental results have demonstrated the performance of the propose method.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115772228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stream-weighted HMM for audio-visual ASR: a study on connected digit recognition","authors":"M. T. Chan","doi":"10.1109/MMSP.2002.1203233","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203233","url":null,"abstract":"We present some new results on connected digit recognition in noisy environments by audio-visual speech recognition. We derive hybrid (geometric- and appearance-based) visual lip features using a real-time lip-tracking algorithm that we proposed previously. Using a single-speaker corpus modeled after the TIDIGITS database, we build whole-word HMMs using both single-stream and 2-stream modeling strategies. For the 2-stream HMM method, we use stream dependent weights to adjust the relative contributions of the two feature streams based on the acoustic SNR level. The 2-stream HMM consistently gave the lowest WER, with an error reduction of 83% at -3 dB SNR level compared to the acoustic-only baseline. Visual-only ASR WER at 6.85% was also achieved, showing the effectiveness of the visual features. A real-time system prototype was developed for concept demonstration.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115779074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A minimum distortion data hiding technique for compressed images","authors":"Ç. Candan, N. Jayant","doi":"10.1109/MMSP.2002.1203318","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203318","url":null,"abstract":"We present a blind data hiding method for JPEG compressed images, which minimizes the perceptual distortion due to data embedding. The proposed system presents a number of options to the encoder to cast the given hidden bits in the compressed content signal. The perceptual distortion cost of each option is calculated from the parameters available to the encoder such as the original image, quantization error due to compression and just noticeable distortion (JND) levels of the original image derived through an empirical human visual system model. The encoder selects the option with the minimum JND cost to cast the hidden bits. By the definition of blind decoding, the decoder should be able to extract the hidden bits without any side information on the option selected or the parameters available to the encoder. The decoder of the proposed system uses simple binary addition on their received transform coefficients to extract the hidden bits blindly. System performance is examined by computer experiments at different compression levels and at different embedding bitrates.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114595017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Content-based video retrieval based on object motion trajectory","authors":"W. Lie, W. Hsiao","doi":"10.1109/MMSP.2002.1203290","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203290","url":null,"abstract":"This paper proposed a content-based video retrieval system based on object motion trajectory. An algorithm for tracking moving objects in MPEG-compressed domain is developed. It is to link individual macroblocks in the temporal domain first and then prune and merge the formed paths by considering spatial adjacency of MBs. In this way, the difficult spatial segmentation problem of traditional methods is avoided and tracking of multiple deformed objects can be achieved. Also, our system is capable of eliminating global motion so that camera motion is allowed. The extracted object motion trajectory is then converted into a form conformable to MPEG-7 motion descriptor (keypoints + interpolating functions). Both interfaces of query-by-example and query-by-sketch are provided and problems in descriptor matching (e.g., mismatch in keypoint interval and video time duration) are solved to achieve robustness and a high recall rate. We have tested our algorithm with real video clips, including fixed- or moving-camera, rigid or deformed, single or multiple objects, varying object size during motion, etc. Exeriments show that the tracking and retrieval results are satisfactory and suitable for further applications.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122111730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video indexing and retrieval based on recognized text","authors":"Huiping Li, D. Doermann","doi":"10.1109/MMSP.2002.1203292","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203292","url":null,"abstract":"In this paper we present our experiments on text-based video indexing and retrieval. Due to expected OCR errors and the lack of semantic breadth in video text, we proposed two solutions: 1) expanding the semantics of the query word, and 2) using Glimpse to perform approximate matching instead of exact matching. The results we achieved showed that semantic expansion and Glimpse can play important roles in video retrieval based on text.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123058329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Segmentation, classification and watermarking for image/video semantic authentication","authors":"Ching-Yung Lin, Belle L. Tseng","doi":"10.1109/MMSP.2002.1203320","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203320","url":null,"abstract":"We propose a novel technique for image/video authentication at the semantic level. This method uses statistical learning, visual object segmentation and classification schemes for semantic understanding of visual content. This system embeds either the classification output or the user annotated model labels into multimedia data as watermarks. A robust rotation, scaling, and translation public watermarking method is used for embedding. The authentication process is executed by comparing the classification result with the information carried by the watermark. This method leads the authentication system to learn the semantic content of multimedia data and performs the authentication task in the semantic level.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126595127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GLS-based TV-CAR speech analysis using forward and backward linear prediction","authors":"K. Funaki","doi":"10.1109/MMSP.2002.1203283","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203283","url":null,"abstract":"We have already proposed novel robust parameter estimation algorithms of a time-varying complex AR (TV-CAR) model for analytic speech signals, which are based on GLS (generalized least squares) and ELS (extended least squares) and have shown that the methods can achieve robust speech spectrum estimation against additive white Gaussian. In these methods, forward prediction error is only used to calculate the MSE criterion. This paper proposes the improved TV-CAR speech analysis methods based on forward and backward linear prediction in which backward prediction error is also adopted to calculate the MSE criterion, viz., the MMSE and GLS-based algorithms using the forward and backward prediction. The experiments with natural speech and natural speech corrupted by white Gaussian demonstrate that the improved methods can achieve more accurate and more stable spectral estimation.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116743630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}