{"title":"Avoiding oddification to simplify MPEG-1 decoding with LNS","authors":"M. Arnold","doi":"10.1109/MMSP.2002.1203264","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203264","url":null,"abstract":"Low-precision logarithmic number system (LNS) arithmetic can reduce the power consumption for MPEG decoding compared to conventional fixed-point techniques. Although this introduces small numeric errors, which violate the IEEE-1180 standard for the inverse discrete cosine transform (IDCT), the visual effects of such error may be tolerable for portable battery-powered devices, like videophones, that have limited-resolution displays. The MPEG standard achieves video compression by quantization of the data fed to the IDCT. The MPEG decoder must multiply this data by dequantization factors. Such multiplication, by itself, is trivial with LNS since adding logarithms is equivalent to multiplication. The IEEE-1180 standard suggests oddification, where fixed-point data is forced to become odd after dequantization to minimize IDCT mismatch between the encoder and the decoder. Oddification poses an implementation problem for data in LNS format. This paper suggests that the visual effect of LNS without oddification is nearly indistinguishable from LNS with oddification, meaning that the benefits of LNS in MPEG are even greater than previously expected.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127004201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Content enhancement for e-learning lecture video using foreground/background separation","authors":"W. Heng, Qi Tan","doi":"10.1109/MMSP.2002.1203339","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203339","url":null,"abstract":"With the popular adaptation of e-learning in many universities, lectures are sometimes distributed online in the form of real-time streaming videos. In these videos, contents on the chalkboard develop into the main sources of study materials for the students. However, they are usually compressed to minimize the bitrate for on-line streaming, and the process blurs the contents on the chalkboard, making them barely readable. Thus, the usefulness of the video channel is being questioned. To improve the quality of video, we enhance the chalkboard contents of the lecture videos by using foreground/background separation and combination technique. The separation allows foreground content to be normalized and de-noised, thus enhancing the readability of the chalkboard texts. The overall technique emphasizes the chalkboard contents while preserving the integrity of the natural video frame. It thus renders the video channel as an important source of information in providing the students with a beneficial e-learning material.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122257829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Speaker recognition using least squares IOHMMs","authors":"Niloy J. Mukherjee","doi":"10.1109/MMSP.2002.1203299","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203299","url":null,"abstract":"The purpose of the speaker recognition is to determine a speaker's identity from his/her speech utterances. Every speaker has his/her own physiological as well as behavioral characteristics embedded in his/her speech utterances. These characteristics can be extracted from utterances and statistically modeled. Through pattern recognition of unseen test speech with statistically trained models, a speaker identity can be recognized. In this paper, we present a discriminative classification based approach for speaker recognition. The system makes use of regularized least squares regression (RLSR) based input output hidden Markov models (IOHMM) as classifier for closed set, text independent speaker identification. The IOHMM allows us to map input sequences to output sequences, using the same processing style as recurrent neural networks. The RLSR allows the IOHMM to be trained in a more discriminative style. The use of hidden Markov models (HMM) and support vector machines (SVM) has also been studied. The performance of the system is assessed using a set of male and female speakers drawn from the TIMIT corpus.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133750292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Natacha Domingues, Joáo Lacerda, P. Aguiar, C. Lopes
{"title":"Aerial communications using piano, clarinet, and bells","authors":"Natacha Domingues, Joáo Lacerda, P. Aguiar, C. Lopes","doi":"10.1109/MMSP.2002.1203345","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203345","url":null,"abstract":"This work explores novel mechanisms for aerial acoustic machine-machine communications. It builds on previous work by some of the authors, as well as others. In this paper we describe aerial acoustic communication systems that sound like musical instruments. The sound primitives come from simple models for the sound of the piano, the clarinet, and the bells. The messages are coded by combining these primitives according to musical harmony. Our experiments show that these communication systems are well suited for applications requiring very low bit rates. Examples of the acoustic signals produced are made available from the WWW.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134056991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Texturing and line art rendering using patch-based image analogies","authors":"P. Bao, Xiaohu Ma","doi":"10.1109/MMSP.2002.1203268","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203268","url":null,"abstract":"We present a simple patch-based matching scheme for generating novel visual appearance in which a new image is synthesized by the optimal pasting of small patches of input sample texture image. First, we use this patch-based matching method as an efficient and simple textured synthesis algorithm to produce a wide range of textures with superb visual appearance. Second, we extend the algorithm for the texture transfer - rendering an image in the style of a different texture image. Third, we generalize the algorithm for processing images by examples, called \"patch-based image analogies\". From this perspective, the texture synthesis and texture transfer are the special cases of the image texturing. Experimental results show that the proposed scheme is extremely efficient and effective for the image texturing problems, such as texture synthesis, texture transfer, line art rendering, etc. Due to the efficiency of the patch-based sampling, our method outperforms many existing algorithms with comparable visual quality.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132675979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jae-Young Sim, Chang-Su Kim, C.-C. Jay Kuo, Sang Uk Lee
{"title":"Normal mesh compression based on rate-distortion optimization","authors":"Jae-Young Sim, Chang-Su Kim, C.-C. Jay Kuo, Sang Uk Lee","doi":"10.1109/MMSP.2002.1203236","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203236","url":null,"abstract":"An efficient progressive compression and interactive transmission algorithm for 3D normal meshes, based on rate-distortion optimization, are proposed in this work. A normal mesh is partitioned into several segments, which are encoded independently. By truncating the bitstream for each segment optimally using a distortion model, the proposed algorithm yields a higher coding gain than the conventional approach. Moreover, to support interactive transmission of 3D models from client's viewpoint, each segment can be endowed with different priority of transmission. The view-dependent processing technique enables to reduce the transmission bandwidth considerably while maintaining the same visual quality.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115255858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Content-based movie coding - an overview","authors":"B. Haskell, A. Dumitras","doi":"10.1109/MMSP.2002.1203255","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203255","url":null,"abstract":"In this paper we discuss modalities of exploiting the distinct characteristics of entertainment movie sequences in the framework of content-based coding. In the content-based movie coding, methods that originate from model-based analysis synthesis and region-based coding, content-based retrieval, content re-purposing and computer graphics domains contribute to achieving simultaneous bit rate reduction in the compressed streams and preservation of high quality of the decoded pictures.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116584749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sample domain integration of medical data for multimedia diagnosis","authors":"Mingui Sun, Y. Shi, Qiang Liu, R. Sclabassi","doi":"10.1109/MMSP.2002.1203321","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203321","url":null,"abstract":"Although there has been active research in data security and multimedia systems, which have wide applications in many fields, certain special problems in the field of medicine have not yet been solved. These problems include integration of data from different sources and provision of multiple levels of access control to protect the privacy of patients. We present a new method for data integration and security by mixing medical waveforms and images with encrypted patient identifiers and unencrypted associative data, such as acquisition parameters, diagnostic images, and notes and comments in textual, pictorial, and voice forms. We vary the sampling rate (or the sampling grid) of data according to their local smoothness. Then, redundant samples (or pixels) are eliminated and replaced by associative data which are labeled using a status string encoded based on the Huffman and run-length techniques. This method achieves both data compression and integration simultaneously.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115126335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive shape-texture intra coding refreshment for error resilient object-based video","authors":"Luís Ducla Soares, F. Pereira","doi":"10.1109/MMSP.2002.1203261","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203261","url":null,"abstract":"Video encoders may use several techniques to improve error resilience. In particular, for video encoders that rely on predictive (inter) coding to remove temporal redundancy, intra coding refreshment is especially useful to stop error propagation when errors occur in the transmission or storage of the coded streams, which can cause the decoded quality to decay very rapidly. In object-based video coders, intra coding refreshment can be applied to both shape and texture data. In this paper, a novel combined intra refreshment scheme is proposed which can be used by object-based video encoders, such as MPEG-4 video encoders, to adaptively determine when the shape and texture of the video objects in a scene should be refreshed in order to maximize the decoded video quality for a certain amount of bitrate resources.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132710632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"View-dependent compression of 3D voxel models based on skeleton representation","authors":"In-Wook Song, Chang-Su Kim, Sang Uk Lee","doi":"10.1109/MMSP.2002.1203237","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203237","url":null,"abstract":"We propose a view-dependent compression algorithm for 3D binary voxel models, based on the skeleton representation. An input model is first partitioned into several segments. The segmentation increases the number of skeleton points, requiring more bits than the original model. Therefore, we develop a remodeling algorithm, which reduces the number of skeleton points in each segment effectively without any loss of information. Each segment is then compressed with a context-based arithmetic coder. Before the transmission, the encoder is informed of the viewing parameters via feedback channel. Then, the encoder transmits visible segments in detail, while cutting off invisible segments. Simulation results show that the proposed algorithm requires much less bitrate than the conventional skeleton-based algorithm, while exhibiting faithful image quality.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"2013 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132238567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}