{"title":"Real-time lip-synch face animation driven by human voice","authors":"Fu Jie Huang, Tsuhan Chen","doi":"10.1109/MMSP.1998.738959","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738959","url":null,"abstract":"In this demo, we present a technique for synthesizing the mouth movement from acoustic speech information. The algorithm maps the audio parameter set to the visual parameter set using the Gaussian mixture model and the hidden Markov model. With this technique, we can create smooth and realistic lip movements.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115505580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Kim, S. Jang, J. Lee, J. Ra, J. Kim, U. Joung, G. Choi, J. Kim
{"title":"Efficient hardware-software co-implementation of H.263 video codec","authors":"S. Kim, S. Jang, J. Lee, J. Ra, J. Kim, U. Joung, G. Choi, J. Kim","doi":"10.1109/MMSP.1998.738951","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738951","url":null,"abstract":"An H.263 video codec is implemented by adopting the concept of hardware and software co-design. Each module of the codec is investigated to find which approach between hardware and software is better to achieve real-time processing speed and flexibility. The hardware portion includes motion-related engines, such as motion estimation and compensation, and memory control. The other portion of the H.263 video codec and other parts of the H.324 system like G.723, H.223, and H.245 are implemented in software using a RISC processor. This paper also introduces efficient design methods for hardware and software modules. In hardware, an architecture for a hierarchical motion estimator using correlation of neighboring motion vectors is suggested to reduce the chip size. Software optimization techniques are also explored using the statistics of transformed coefficients and the minimum sum of absolute difference (SAD) obtained from the motion estimator.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124809196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Arithmetic coded vector SPIHT with classified tree-multistage VQ for color image coding","authors":"D. Mukherjee, S. Mitra","doi":"10.1109/MMSP.1998.738988","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738988","url":null,"abstract":"A vector extension of the set partitioning in hierarchical trees (SPIHT) algorithm, named vector-SPIHT (VSPIHT), using trained classified successive refinement VQ, has recently been proposed. In this work, vector set-partitioning is applied to multispectral image compression, in particular to 24-bit color images. Since the individual spectral components are sufficiently correlated, VSPIHT can effectively exploit both the inter-component redundancy as well as the spatial redundancy within each subband of each component, to yield performance superior to separate scalar SPIHT coding of each component. Adaptive arithmetic coding of the first stage VQ index for each class, as well as the significance information, further improves the performance. Coding results demonstrate that the vector-based approach for color images significantly outperforms the scalar counterpart in the mean-squared-error sense.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123701880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Capture and synthesis of human motion in video sequences","authors":"Jia-Ching Cheng, José M. F. Moura","doi":"10.1109/MMSP.1998.738921","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738921","url":null,"abstract":"We present a knowledge-based framework to capture and represent human walkers in video. The system models the human body as an articulated object of twelve rigid body-parts whose motions are almost periodic and subject to dynamic constraints. The resulting representation is compact and composed of the motion, shape, and texture for each of the body-parts. We apply the representation to regenerate the original sequence and to synthesize articulated 3D human actions.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"173 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126756372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. J. Chung, Hwangjun Song, Tien-Ying Kuo, JongWon Kim, C.-C. Jay Kuo
{"title":"Continually traffic accommodating Internet streaming video","authors":"Y. J. Chung, Hwangjun Song, Tien-Ying Kuo, JongWon Kim, C.-C. Jay Kuo","doi":"10.1109/MMSP.1998.738969","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738969","url":null,"abstract":"A new way to implement the Internet modem video transmission is presented. This system is capable of continually accommodating its bitstream size in response to changing network conditions. The key idea is to adopt an adaptive least mean squares (LMS) controller to orchestrates an H.263+ encoder rate control at the server end as well as a fast frame interpolation at the client end. It is demonstrated that the user datagram protocol (UDP) packet loss is significantly reduced and a smooth video display can be achieved.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114957500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discrete wavelet frame representations of color texture features for image query","authors":"Tao Chen, K. Ma, Li-Hui Chen","doi":"10.1109/MMSP.1998.738911","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738911","url":null,"abstract":"We propose a wavelet-based multi-channel scheme to extract human-perception relevant color texture features for image indexing and querying. For each spectral band, a two-dimensional discrete wavelet frame (DWF) decomposition is applied first, followed by an enveloping operation performed on the resulting wavelet coefficients. The unichrome features computed from the enveloped coefficients of the individual band as well as the opponent features that provide the spatial correlation between different spectral bands are jointly exploited for accurate image classification. The experimental results are promisingly, showing that the proposed approach is suitable for browsing color texture images.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116443748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VTJukebox: implementation issues for RTP-based recording and on-demand multicast of multimedia conferences","authors":"Baldine B. Paul, M. Civanlar","doi":"10.1109/MMSP.1998.738944","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738944","url":null,"abstract":"We describe the implementation of a system for reliable recording and on-demand playback of multimedia conferences using the services provided by the RTP/RTCP protocols over an intranet. Implementation issues are underlined regarding the file indexing strategy to support random access during playback, the scheduling of packets to reduce transmission delay jitter, replacement of lost packets within the media stream and recorder allocation and deployment.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122576826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic text tracking in digital videos","authors":"Huiping Li, D. Doermann","doi":"10.1109/MMSP.1998.738907","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738907","url":null,"abstract":"We address the problem of automatically tracking moving text in digital videos. Our scheme consists of two separate processes: monitoring which detects the new text line entering a frame, and tracking which uses prediction techniques to reconcile the text from frame to frame. Temporal consistency allows one to monitor periodically and reduce the computation complexity. The tracking process uses a rapid prediction/search scheme to update the position of the text blocks between frames. We provide details of the implementation and results for text moving in the scene and text which moves as a result of camera motion.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129106815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combining vocal and visual cues in an identity verification system using K-NN based classifiers","authors":"P. Verlinde, G. Chollet","doi":"10.1109/MMSP.1998.738913","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738913","url":null,"abstract":"The contribution of this paper is twofold: (1) to formulate a fusion problem encountered in the design of a multi-modal identity verification system as a particular classification problem, (2) to propose a simple classifier to solve this problem. The multi-modal identity verification system under consideration is built of d modalities in parallel, each one delivering as output a scalar number, called the score, stating how well the claimed identity is verified. A fusion module receiving as input the d scores has to take a binary decision: accept or reject identity. We have solved this fusion problem using a classic k-nearest-neighbor (k-NN) classifier. The most important problem encountered with this simple classifier is the unbalance between the number of reference points in either class. Adapting the classic k-NN classifier using distance weighting and vector quantization principles enables to reduce the influence and the number of impostor reference points respectively. This constitutes the originality of this paper. The performances of these different fusion modules have been evaluated on a multi-modal database, containing both vocal and visual modalities.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114344964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Aiazzi, P. S. Alba, L. Alparone, S. Baronti, F. Lotti, A. Mattei
{"title":"A distributed implementation of fuzzy clustering and switching of linear regression models for lossless compression of imagery and 3D data","authors":"B. Aiazzi, P. S. Alba, L. Alparone, S. Baronti, F. Lotti, A. Mattei","doi":"10.1109/MMSP.1998.738966","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738966","url":null,"abstract":"A distributed implementation of a new method for reversible compression of both 2D and 3D data is presented. A classified prediction is first trained through fuzzy clustering; then, data decorrelation is accomplished by prediction in a fuzzy fashion. Context-based adaptive arithmetic coding is tailored to the prediction errors to enhance entropy coding. Results and comparisons with other schemes are presented and discussed together with computational issues.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121692257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}