{"title":"Spatial and Temporal Adaptation of Interpolation Filter For Low Complexity Encoding/Decoding","authors":"D. Rusanovskyy, M. Gabbouj, K. Ugur","doi":"10.1109/MMSP.2007.4412843","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412843","url":null,"abstract":"Compared to video coding with non-adaptive interpolation filtering, adaptive filters achieve higher compression ratios, with an increase in encoding and decoding complexity. In our earlier work, we significantly reduced the decoding complexities of adaptive filtering schemes with a minimal impact on the coding efficiency by making use of different filters and adapting them spatially and temporally. However, our previous scheme required high encoder complexity, as several encoding passes per frame were needed to analyze the input image and optimize the selection of interpolation filters. In this paper, a novel algorithm that does not require multiple encoding passes, but still give similar or better performance is proposed. This is achieved by using a modified decision making function that does not require full reconstruction of coded frame and use motion and prediction information more efficiently. In addition, we generalized our previous scheme by introducing additional filters, so that better Rate-Distortion-Complexity tradeoffs are possible. Experimental results show that up-to 50-70% reduction in interpolation complexity is achieved, with less than 0.13 dB penalty on coding efficiency.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125437421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Perceptual Enhancement for Fully Scalable Audio","authors":"Te Li, S. Rahardja, S. Koh","doi":"10.1109/MMSP.2007.4412811","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412811","url":null,"abstract":"MPEG-4 scalable lossless (SLS) coding is the latest released ISO international standard for scalable audio coding. Besides its function as an extension of MPEG-4 advanced audio coding (AAC) perceptual audio coder, SLS has a \"non-core mode\" that is able to offer full scalability. The perceptual audio coder is absent in this mode and scalability is achieved through pure bit-plane coding. In this paper, a perceptually enhanced bit-plane coding method, namely Quad-level bit-plane coding (QBPC) is proposed to enhance the perceptual quality of fully scalable audio at intermediate bitrates. With QBPC structure, the perceptual quality of fully scalable audio coded by SLS is significantly improved in a wide range of intermediate bitrates. Meanwhile this is achieved with trivial added overhead and complexity.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122646371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zdenek Becvar, L. Novák, J. Zelenka, M. Brada, P. Slepička
{"title":"Impact of Additional Noise on Subjective and Objective Quality Assessement in VoIP","authors":"Zdenek Becvar, L. Novák, J. Zelenka, M. Brada, P. Slepička","doi":"10.1109/MMSP.2007.4412813","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412813","url":null,"abstract":"The main requirement in the Voice over IP technology is a good quality of received voice signal during communication between subscribers. The signal quality can be influenced by many factors such as packet loss, jitter, packet delay, noise etc. and it can be measured by number of methods. The main purpose of this paper is the investigation of an impact of different noise types and different noise levels on the quality assessment in VoIP. The artificial generated noises and real noises obtained from real telecommunications networks were used for testing. The next goal is a comparison of the results obtained by subjective listening tests and objective measuring methods. PESQ and 3SQM were used for objective testing in this paper.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121435180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Flexible Video Decoding: A Distributed Source Coding Approach","authors":"Ngai-Man Cheung, Antonio Ortega","doi":"10.1109/MMSP.2007.4412828","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412828","url":null,"abstract":"We investigate video compression techniques to address problems that require flexible video decoding. In these, the encoder has access to a number of candidate predictors that allow it to exploit source signal correlation, but only a subset of these predictors will be available at the decoder. Crucially, the encoder does not know which predictors will be available. Flexible decoding is important in a number of applications including frame-by-frame forward and backward video playback, multiview video, bitstreams switching, robust video transmission, etc. The main challenge to support flexible decoding is that the encoder needs to compress a current frame under the uncertainty on the predictor at decoder. An approach based on conventional \"closed loop\" prediction, e.g., motion-compensated predictive (MCP) coding in the case of video, could be developed by including multiple possible prediction residues in the bitstream, but this would lead to a considerable coding performance penalty, if all possible predictor combinations are supported, or to drifting, if only some combinations are. Moreover, it is not possible in general to guarantee that decoded versions under different prediction scenarios will be identical. In this paper, we propose a distributed source coding (DSC) based algorithm to tackle the problem. The main novelties of the proposed algorithm are that it incorporates different macroblock modes and significance coding within the DSC framework. This, combined with a judicious exploitation of correlation statistics, allows us to achieve competitive coding performance. Using forward/backward video playback as an example, we demonstrate the proposed algorithm can outperform a solution based on MCP coding.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132518502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image alignment with rotation manifolds built on sparse geometric expansions","authors":"E. Kokiopoulou, P. Frossard","doi":"10.1109/MMSP.2007.4412850","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412850","url":null,"abstract":"In this paper we discuss the problem of alignment of patterns under arbitrary rotation. When a generic image pattern is geometrically transformed, it typically spans a (possibly nonlinear) manifold in a high dimensional space. When the pattern of interest is given by a sparse approximation over a structured dictionary of geometric atoms, we show that the rotation manifold can be expressed analytically as a function of the transformation parameters. At the same time, its high order derivatives are also given in a closed form when the pattern is represented as a sparse linear combination of a few differentiable basis functions. In this framework, the alignment problem is formulated as the minimization of the distance between the reference pattern and the manifold, which boils down to a nonlinear least squares optimization problem. We propose to solve this problem by a Newton-type method, whose solution is facilitated by the analytical expressions of the manifold derivatives. We further derive a global optimization heuristic algorithm based on Newton, and provide sufficient conditions for computing the global minimizer. Experimental results demonstrate the effectiveness of the proposed methodology for image alignment and rotation invariant pattern recognition.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134457731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic FEC-Distortion Optimization for H.264 Scalable Video Streaming","authors":"Wei-Chung Wen, Hsu-Feng Hsiao, Jen-Yu Yu","doi":"10.1109/MMSP.2007.4412839","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412839","url":null,"abstract":"Forward error correction codes have been shown to be a feasible solution either in application layer or in link layer to fulfill the need of quality of service for multimedia streaming over the fluctuant channels. In this paper, we propose FEC-distortion optimization algorithms to efficiently utilize the bandwidth for better video quality. The optimization criterions are based on the unequal error protection by taking account of the error drifting problems from both temporal motion compensation and inter-layer prediction of H.264/MPEG-4 AVC scalable video coding. Also, it can adapt to the content-dependent quality contribution of each video frame in a video layer. Lightweight error-concealment is also incorporated with the proposed algorithms for better H.264 SVC streaming. For some applications where either computation might be the bottleneck or the upper bound of non-decodable probability of each video layer is specified, alternative bandwidth allocation algorithm is provided with the trade-off of slight quality degradation.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133228726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analyzing the Multimodal Behaviors of Users of a Speech-to-Speech Translation Device by using Concept Matching Scores","authors":"Jongho Shin, P. Georgiou, Shrikanth S. Narayanan","doi":"10.1109/MMSP.2007.4412867","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412867","url":null,"abstract":"We investigate factors related to interfacing a speech-to-speech translation device with multimodal capabilities. We evaluate the efficacy of the interactions using a measure for meaning transfer, we call concept score. We show that employing a multimodal interface improves translation quality, in this study, by 24%. We also show that while some users require perfect representation of what they said in order to allow transfer, others accept concept degradation to some extent, in median up to 20% in our experiments. An appropriate system strategy is required to recognize this behavior and guide users towards optimum performance points. For example, we show that appropriate feedback is required to guide the users in their choices of translation method, as 13% of the choices users made are worse than the alternatives the system provided.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116435297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimodal Sensor Analysis of Sitar Performance: Where is the Beat?","authors":"M. S. Benning, A. Kapur, B. Till, G. Tzanetakis","doi":"10.1109/MMSP.2007.4412821","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412821","url":null,"abstract":"In this paper we describe a system for detecting the tempo of sitar performance using a multimodal signal processing approach. Real-time measurements are obtained from sensors on the instrument and by wearable sensors on the performer's body. Experiments comparing audio-based and sensor-based tempo tracking are described. The real-time tempo tracking method is based on extracting onsets and applying Kalman filtering. We show how late fusion of the audio and sensor tempo estimates can improve tracking. The obtained results are used to inform design parameters for a real-time system for human-robot musical performance.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124309496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiple description image coding with redundant expansions and optimal quantization","authors":"Ivana Radulovic, P. Frossard","doi":"10.1109/MMSP.2007.4412844","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412844","url":null,"abstract":"This paper addresses the problem of optimal rate allocation for multiple description coding with redundant signal expansions. In case of redundant descriptions, the quantization of the transform coefficients has clearly to be adapted to the importance of the basis functions, to the redundancy in the representation, and to the expected loss probability on the transmission channel. We derive a rate-distortion optimal solution for the scalar quantization of coefficients in redundant signal representations. The application of the optimal rate allocation to a typical image communication problem demonstrates performance gains with respect to scheme based on uniform quantization with fixed step size, and to solutions based on unequal error protection.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125299282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rate-Distortion Optimized I-Slice Selection for Low Delay Video Transmission","authors":"Yuan Lin, A. N. Kim, Eren Gürses, A. Perkis","doi":"10.1109/MMSP.2007.4412831","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412831","url":null,"abstract":"Rate smoothing is essential for achieving lower delay when transmitting real-time video over the network. Recently, \"explicit slice-based mode selection\" (ESM) is proposed as a new way of achieving this goal together with its inherent quality smoothness and error resilience features. However previous studies focus on the practical aspects and do not address an optimized solution. In this paper, we propose a rate-distortion (RD) optimized solution for finding the best location and size of the intra-coded slices. The experimental results show that for a target bit rate the optimized scheme is able to offer performance close to that of mode selection on a macroblock level, over wireless channels with different packet loss rates. Moreover, the optimized ESM algorithm provides significant advantages of granular bit stream prioritization for network transmission. However, the RD based optimization is in general computationally expensive. We therefore propose a heuristic approach which incorporates both channel statistics and sequence characteristics. Results show that it yields close to optimal performance at lower complexity.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116687169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}