{"title":"Perceptual classification of MPEG video for Differentiated-Services communications","authors":"F. D. Vito, L. Farinetti, Juan Carlos De Martin","doi":"10.1109/ICME.2002.1035738","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035738","url":null,"abstract":"We present a distortion-based packet marking technique for transmission of motion-compensated video over Differentiated Services networks. For each macroblock of an MPEG2 video sequence, the distortion that would be caused at the receiver by its loss is computed. High distortion macroblocks are grouped into perceptually important slices that can be transmitted as premium packets, while lower distortion slices are sent as less expensive, best-effort traffic. Firstly, computation of the distortion introduced in the current frame only is compared to exhaustive computation of the distortion introduced in the entire group of pictures (GOP) due to the error propagation. Secondly, allocation of the premium traffic on a frame-by-frame basis is compared to GOP-wide allocation. Results show that GOP-wide allocation of premium traffic is key in using premium bandwidth efficiently, with strong PSNR gains with respect to the other approaches. We also propose a model-based distortion computation technique, which, combined with GOP-level premium traffic allocation, delivers nearly the same performance of the exhaustive approach at a fraction of its complexity.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"18 1","pages":"141-144 vol.1"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89029123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reducing blurring-effect in high resolution mosaic generation","authors":"R. S. Aygün, A. Zhang","doi":"10.1109/ICME.2002.1035670","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035670","url":null,"abstract":"The mosaic generation methods benefit from global motion estimation (GME) methods, which yield almost accurate estimation of motion parameters. However, the generated mosaics are usually more blurred than the original frames due to the image warping stage and errors in motion estimation. The transformed coordinates resulting from GME are generally real numbers whereas images are sampled into integer values. Although GME methods generate proper motion parameters, a slight error in motion estimation may propagate to subsequent mosaic generation steps. We propose a method to generate clearer mosaics from video. The temporal integration of images is performed using the histemporal filter based on the histogram of values within an interval. The initial frame in the video sequence is registered at a higher resolution to generate a high resolution mosaic. Instead of warping of each frame, the frames are warped into the mosaic at intervals. This reduces the blurring in the mosaic.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"10 1","pages":"537-540 vol.2"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91383189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A dynamic video combiner for multipoint video conferencing using wavelet transform","authors":"K. Fung, W. Siu, N. Law","doi":"10.1109/ICME.2002.1035363","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035363","url":null,"abstract":"A new architecture of video combiner for multipoint video conferencing is proposed. The proposed video combiner is wavelet-based which extracts the motion activities information from the video bitstreams produced by a wavelet-based video coder. Using the progressive properties of wavelet transform, the encoded bitstream become scalable. Hence, the video quality of inactive sub-sequences can be easily adjusted in the video combiner by discarding the fine detail information bitstreams. In other words, more bits can be reallocated to the active sub-sequences to achieve a good visual quality with smooth motion. In addition, the video coder is region-based so that different wavelet kernels can be used for the foreground and the background. This setting can on one hand reduce the computational complexity significantly. On the other hand, by considering the unequal importance of various regions, a high video quality in foreground can always be guaranteed and an acceptable quality in background can be maintained even under low bitrate environments. Since the video combiner only needs to rearrange the video quality level according to their motion activities, no re-encoding process is required. Therefore, a significant computational complexity saving can be achieved as compared to the conventional video combiner using a transcoding approach. The new video combiner is then used to realize a multipoint video conferencing and some results are presented to show the improvement in performance due to our proposed architecture.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"19 1","pages":"17-20 vol.2"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84624252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cache-efficient wavelet lifting in JPEG 2000","authors":"S. Chatterjee, Christopher Brooks","doi":"10.1109/ICME.2002.1035902","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035902","url":null,"abstract":"The discrete wavelet transform (DWT), the technology at the heart of the JPEG 2000 image compression system, operates on user-definable tiles of the image, as opposed to fixed-size blocks of the image as does the discrete cosine transform (DCT) used in JPEG. This difference reduces artificial blocking effects but can severely stress the memory system. We examine the interaction of the DWT and the memory hierarchy, modify the structure of the DWT computation and the layout of the image data to improve cache and translation lookaside buffer (TLB) locality, and demonstrate significant performance improvements of the DWT over a baseline implementation. Our optimized DWT implementation exhibits speedups of up to 4/spl times/ over the DWT in a JPEG 2000 reference implementation.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"797-800 vol.1"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84729191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fine-grained scalable video broadcasting over cellular networks","authors":"Jiangchuan Liu, Bin Li, Bo Li, Xi-Ren Cao","doi":"10.1109/ICME.2002.1035807","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035807","url":null,"abstract":"In layered video broadcasting, if adaptation is performed only by receivers, significant mismatches between a receiver's expected bandwidth and the actually delivered bandwidth could occur as the adaptation unit is a coarse-grained layer. The paper shows that fine-grained sender adaptation, as a complement to receiver adaptation, can significantly decrease these mismatches. A formal study on optimal session and layer bandwidth allocations for sender adaptation in a broadband cellular network is carried out. The most fundamental issues associated with layered video broadcasting, including system utility and the overhead of layering, are considered. A polynomial-time algorithm is derived for optimal allocation. Experimental results show that it can significantly improve the system utility compared to static allocation algorithms.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"55 1","pages":"417-420 vol.1"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85297415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the methods and applications of arbitrarily downsizing video transcoding","authors":"Yap-Peng Tan, Haiwei Sun, Yongqing Liang","doi":"10.1109/ICME.2002.1035855","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035855","url":null,"abstract":"Video transcoding is a common technique for adapting the bitrate or spatial/temporal resolution of a compressed video to suit different transmission bandwidths or receiving devices. To reduce the computational complexity, many fast methods have been proposed to estimate the motion vectors required for downsizing a pre-coded video by an integer factor. We develop and compare several fast video transcoding methods for downsizing a pre-coded video by an arbitrary factor Methods which out-perform others under different conditions are identified and discussed. To exploit fully the advantages of arbitrarily downsizing video transcoding, we also design a scheme to determine the reduced frame size that can sustain the best possible video quality for a given target bitrate. Experimental results are presented to show the performance of the proposed video transcoding methods.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"609-612 vol.1"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90372738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Audio mixing for centralized conferences in a SIP environment","authors":"Samer Hawwa","doi":"10.1109/ICME.2002.1035572","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035572","url":null,"abstract":"This paper focuses on centralized conferences in a session initiation protocol (SIP) environment and the different techniques deployed in audio mixing. We express the main issues in mixing audio over IP and discuss our own prototype as part of the multipoint control unit (MCU) implementation. We show measurements of our own implementation of audio mixing and present new features to be supported in future releases.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"36 1","pages":"269-272 vol.2"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87213182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Error detection in a compressed video using fragile watermarking","authors":"Younghooi Hwang, B. Jeon","doi":"10.1109/ICME.2002.1035735","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035735","url":null,"abstract":"This paper proposes an error detection technique using fragile watermarking. The fragile watermark is embedded in the least significant bits of the selected transform coefficients decided to balance between deterioration of PSNR value and error detection efficiency. The proposed method is usable without additional bits in the video bitstream and can be implemented very efficiently. This method will be useful in an error prone environment like a wireless channel.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"6 1","pages":"129-132 vol.1"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75349124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid natural and structured audio coding for 3D scenes","authors":"S. Battista, G. Zoia, A. Simeonov, R. Zhou","doi":"10.1109/ICME.2002.1035829","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035829","url":null,"abstract":"Natural and structured audio representations can be characterized by the lack or presence of a model describing the sound, respectively; combination of the two approaches can lead to efficient and improved storage and transmission of both speech and music, mixing less efficient but general technologies with more compact and specialized models. Integration of natural audio tracks with structured sound and 3D spatial processing is a challenging effort, especially when the audio, scene requires high quality and precise synchronization with video and graphic information, as it is the case in professional multimedia and virtual reality frameworks. In this paper natural and structured sound are surveyed and a new player is presented, which supports all the mentioned technologies in a normative context.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"37 1","pages":"505-508 vol.1"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75426115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Iterative 3D surface modelling from a sparse set of matched feature points","authors":"N. Xu, N. Ahuja","doi":"10.1109/ICME.2002.1035926","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035926","url":null,"abstract":"We present an iterative algorithm to reconstruct a 3D object surface from a sparse set of matched feature points on the input stereo images of the object. The initial matches are sparse and do not have to be accurate. The reconstructed 3D surface is represented in terms of triangular polygons whose vertices are initially the 3D points corresponding to these matched feature points. In order to render photorealistic images of the surface, these feature points are iteratively updated. New feature points are added into the feature point set as well as the depth estimates of the feature points are refined. Experimental results showing the updated correspondences, reconstructed surfaces and virtual views rendered from new directions are presented.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"47 1","pages":"893-896 vol.1"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75454437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}