{"title":"A Hybrid Encoder/Decoder Rate Control for Wyner-Ziv Video Coding with a Feedback Channel","authors":"D. Kubasov, K. Lajnef, C. Guillemot","doi":"10.1109/MMSP.2007.4412865","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412865","url":null,"abstract":"This paper describes a hybrid coder/decoder rate control for a Wyner-Ziv video coding scheme with a feedback channel. The approach is first based on a method to estimate at the encoder the minimum rate required for the Slepian-Wolf (SW) encoded data. This estimation makes use only of the Lapacian correlation model. The decoder then estimates the bit error rate (BER) for each decoded bit plane from likelihood ratios computed at the output of the SW decoder. The robustness of the BER estimation is further improved by the use of an error detection mechanism based on a cyclic redundancy checksum. Experimental results show that rate-distortion performances are comparable to those which would be obtained by computing the Hamming distance between the original and the decoded data. In addition the hybrid coder/decoder rate control reduces the decoding complexity as well as the usage of the return channel.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134316607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Applying Image Analysis to Auto Insurance Triage: A Novel Application","authors":"Ying Li, C. Dorai","doi":"10.1109/MMSP.2007.4412872","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412872","url":null,"abstract":"For the auto insurance claims process, improvements in the First Notice of Loss and rapidity in the investigation and evaluation of claims could drive significant values by reducing loss adjustment expense. This paper proposes a novel application where advanced technologies in image analysis and pattern recognition are applied to automatically identify and characterize automobile damage. Success in this will allow some cases to proceed without human adjusters, while others to proceed more efficiently, thus ultimately shortening the time between the first Notice of Loss and the final payout. To investigate its feasibility, we built a prototype system which automatically identifies the damaged area(s) based on the comparison of before-and after-accident automobile images. Performance of the prototype system has been evaluated on images taken from forty scaled model cars under reasonably controlled environments, and encouraging results were obtained. It is our belief that, with the advancement of image analysis and pattern recognition technologies, the proposed idea could evolve into a very promising application area where the auto insurance industry could significantly benefit.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131759711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluation of Conversational Voice Communication Quality of the Skype, Google-Talk, Windows Live, and Yahoo Messenger Voip Systems","authors":"B. Sat, B. Wah","doi":"10.1109/MMSP.2007.4412836","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412836","url":null,"abstract":"In this paper, we evaluate die conversational voice communication quality (CVCQ) of VoIP systems, both from the user and the system perspectives. We first identify the metrics for CVCQ, which include listening-only speech quality (LOSQ). conversational interactivity (CI), and conversational efficiency (CE). These depend on the mouth-to-ear delays (MEDs) between the two clients. Based on packet traces collected in the PlanetLab and on the dynamics of human interactive speech, we study four popular VoIP client systems: Skype (v2.5), Google-Talk (Beta), Windows Live Messenger (v8.0). and Yahoo Messenger with Voice (v8.0). under various network and conversational conditions.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124310370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A User-Centric QoS Management Approach for Digital Home","authors":"Aleksej Spenst, T. Herfet","doi":"10.1109/MMSP.2007.4412905","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412905","url":null,"abstract":"Modern home networking technologies provide great possibilities for distributing digital content throughout the home. However, media distribution is not ready yet to satisfy users' quality expectations, especially in wireless networks. That is why a number of quality of service (QoS) mechanisms have to be deployed in the network in order to guarantee a certain level of quality. We propose a user-centric QoS management approach based on user preferences. Unlike traditional QoS approaches that always give multimedia applications a pre-determined high priority due to their strict latency and loss requirements, the proposed approach classifies traffic according to user preferences, thus, ensuring a certain quality regardless of the application requirements. The application priorities are assigned according to user profiles and automatically updated in case of changes in the network or user environment. The paper suggests that the introduced management approach will significantly contribute to the overall user experience in digital home.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123047202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advanced FGS Coding Scheme Based on MPEG-4 FGS Technology","authors":"Kwang-deok Seo, Kyu-Chan Roh","doi":"10.1109/MMSP.2007.4412879","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412879","url":null,"abstract":"In this paper, we propose an advanced FGS coding scheme based on MPEG-4 FGS technology. The proposed FGS enhancement-layer encoder takes as input the difference between the original DCT coefficient and the decision level of the quantizer, instead of the difference between the original DCT coefficient and its reconstruction level. Using this residual signal, the sign information of the enhancement-layer DCT coefficients can be the same as that of the base-layer ones at the same frequency index in DCT domain. Thus, overhead bits required for coding a lot of sign information of the enhancement-layer DCT coefficients in bit-plane coding can be removed from the generated bitstream. Moreover, the residue input signal to the enhancement layer shows a sharper distribution with less entropy for the DCT coefficients. Based on the sharper distribution, a new set of VLC tables are established for improved bit-plane coding. It is shown by simulations that the proposed FGS coding scheme provides better coding performance than the MPEG-4 FGS coding.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116672264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nobutaka Ono, Souichiro Fukamachi, T. Nishimoto, S. Sagayama
{"title":"Sound Source Localization by Asymmetrically Arrayed 2ch Microphones on a Sphere","authors":"Nobutaka Ono, Souichiro Fukamachi, T. Nishimoto, S. Sagayama","doi":"10.1109/MMSP.2007.4412817","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412817","url":null,"abstract":"In this paper, we propose a novel system to localize a sound source in any 2D directions using only two microphones. In our system, the two microphones are asymmetrically placed on a sphere, thus, (1) the diffraction by the sphere and the asymmetrical arrangement of the microphones yield the localization cue including the front-back judgment, and (2) unlike the dummy head system, no previous measurements are necessary due to the analytical representation of the sphere diffraction. To deal with reverberation or ambient noises, we consider the maximum likelihood estimation of the direction of arrival with a diffused noise model on a sphere. We present a real system that we built through the investigation of the optimal microphone arrangement for speech, and give experimental results in real environment.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115408348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Embedded System for In-Vehicle Visual Speech Activity Detection","authors":"V. Libal, J. Connell, G. Potamianos, E. Marcheret","doi":"10.1109/MMSP.2007.4412866","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412866","url":null,"abstract":"We present a system for automatically detecting driver's speech in the automobile domain using visual-only information extracted from the driver's mouth region. The work is motivated by the desire to eliminate manual push-to-talk activation of the speech recognition engine in newly designed voice interfaces in the typically noisy car environment, aiming at reducing driver cognitive load and increasing naturalness of the interaction. The proposed system uses a camera mounted on the rearview mirror to monitor the driver, detect face boundaries and facial features, and finally employ lip motion clues to recognize visual speech activity. In particular, the designed algorithm has very low computational cost, which allows real-time implementation on currently available inexpensive embedded platforms, as described in the paper. Experiments are also reported on a small multi-speaker database collected in moving automobiles, that demonstrate promising accuracy.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"301 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115138584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Segmentation of Document Images Using Higher Order Statistics","authors":"P. Borges, J. Mayer, E. Izquierdo","doi":"10.1109/MMSP.2007.4412876","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412876","url":null,"abstract":"This work presents an efficient post-segmentation method for separating text from the background in document images. For this task, this paper proposes the use of textured patterns to represent text in documents, instead of the standard black. It is shown that, in poor quality documents, text segmentation is more efficient when the characters in the document are represented in a halftoned gray level prior to printing. This occurs because the halftoning process induces statistical characteristics that help the text to be distinguished from noise or background. A typical case are noisy printed and scanned documents. Experiments validate the analysis and the applicability of the segmentation method. An important application for the method is in the postal service, where letters have their addresses segmented for automatic sorting.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"209 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114742518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Face Tracker Trajectories Clustering Using Mutual Information","authors":"N. Vretos, V. Solachidis, I. Pitas","doi":"10.1109/MMSP.2007.4412854","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412854","url":null,"abstract":"In this paper we propose an algorithm for face tracker's trajectories clustering. Our approach is based on the mutual information of the images and more precisely its normalized version (NMI). We make use of 2 color channels from the HSV space (hue and saturation) in order to calculate a 4D joint histogram and therefore calculate the mutual information. In this paper we also develop an algorithm where we apply robust heuristics and make use of a tracker information in order to diminish dimensionality and augment accuracy of our results. It is a supervised clustering algorithm which is therefore used (fuzzy c-means) in order to gather same trajectories and same faces together.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128125133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bus Bandwidth Aware H.264/AVC Motion Compensation Design for High Definition Video Encoding","authors":"Chan-Sik Park, Ju-hee Kim","doi":"10.1109/MMSP.2007.4412871","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412871","url":null,"abstract":"H.264/AVC outperforms previous video coding with many other tools, but the complexity increases much. As the image resolution is high, the bus bandwidth and data read cycle are the most important problems for the real time processing decoder. Motion compensation is the main bottleneck, which can use about over 50% bandwidth of the total bandwidth. In this paper, we present a bandwidth aware motion compensation (BAMC) which can solve both the read cycle and the data bandwidth problems in various systems. BAMC saves 60-80% MC read cycle and 50-85% data bandwidth at the system with DDR1-32bit or DDR2-64 bit. The proposed MC hardware can supports 1088-60 p real-time processing with less than 266 MB/S total bus memory bandwidth.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128196375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}