{"title":"Manipulation of text documents in the modified Group 4 domain","authors":"Shulan Deng, S. Latifi, J. Kanai","doi":"10.1109/MMSP.1998.738987","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738987","url":null,"abstract":"This paper presents a novel approach to document image compression that is efficient in both compression and processing flexibility. By proper exploitation of the structural characteristics of compressed data, one may obtain high performance for image operations with low complexity. Based on CCITT Group 4, an improved coding scheme (MG4), which exploits the 2-dimensional correlation between scan lines, is developed. Then such operations as skew detection, skew correction and connected component extraction are investigated and implemented. These operations are shown to run faster in the compressed domain than traditional methods.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":" 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113950605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-time lip tracking and bimodal continuous speech recognition","authors":"M. T. Chan, You Zhang, Thomas S. Huang","doi":"10.1109/MMSP.1998.738914","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738914","url":null,"abstract":"We investigate using a bimodal approach to speech recognition by incorporating additional visual features derived from lip movement of the speaker. A reference contour model is used to track the lip outline of the speaker. By using color, constraining the deformation in an affine subspace, and by incorporating an outlier rejection mechanism, our system is robust and runs in real time. To address the model initialization issue, a fast lip localization algorithm is also incorporated. A sample of continuous bimodal speech data based on a confined vocabulary (useful for our application area) was synchronously captured for training and testing. Using the hidden Markov modeling framework, we trained our bimodal context-dependent sub-word-based recognizer in a few different ways. The experiments show that the bimodal recognizer compares favorably to the acoustic-only counterpart. The results also indicate that it is advantageous to include first derivatives of the visual features. Furthermore, the 2-stream modeling scheme appears to be preferable to the 1-stream case for bimodal speech.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124786106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Noise reduction algorithms employing an intelligent inference engine for multimedia applications","authors":"A. Czyżewski, R. Królikowski","doi":"10.1109/MMSP.1998.738923","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738923","url":null,"abstract":"Two approaches to noise reduction are presented, namely the spectral subtraction system and the perceptual coding algorithm allowing to diminish audible noise. Both systems are controlled by an intelligent inference engine based on fuzzy logic. An extension of perceptual coding applications was proposed and verified experimentally with regard to noise removal originally present in acoustic signals. The engineered intelligent systems for noise reduction are presented briefly.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128579046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Temporal scalability using P-pictures for low-latency applications","authors":"S. Wenger","doi":"10.1109/MMSP.1998.739040","DOIUrl":"https://doi.org/10.1109/MMSP.1998.739040","url":null,"abstract":"Many of the newer video coding/compression standards, including MPEG-4 and H.263+ support some form of temporal scalability as part of their layered codec concept. This is generally realized by using bi-directionally predicted pictures (B-pictures), which use both an earlier and a subsequent P-picture as reference (anchor) pictures. This paper introduces a new form of temporal scalability employing only P-pictures. This mechanism results in improved real-time behavior (particularly lower latency) and a more flexible layering structure, at the cost of less efficient coding. The P-picture based temporal scalability mechanism is particularly useful for interactive multimedia communication on networks that offer several independent transport streams (possibly with different quality of service), but have sub-optimal real-time characteristics. Typical applications include both the Internet and some forms of mobile communication. The 1998 version of H.263 (known as H.263+ in both academia and industry) offers a mechanism to support P-picture scalability within the bit-stream through the reference picture selection mode. Other video coding standards, including those of the MPEG family, require slight modifications to the decoder in order to support the proposed mechanism.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127291785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Building human face models from two images","authors":"Qian Chen, G. Medioni","doi":"10.1109/MMSP.1998.738922","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738922","url":null,"abstract":"We present a practical technique for building 3-D human face models from two photographs. Rather than using expensive 3-D scanners, we show that frontal face models can be faithfully reconstructed with unsophisticated digital cameras in a totally non-invasive setup. We propose a rectification algorithm based on the fundamental matrix by computing the dual of the point transformation matrix. The image matching problem is converted into a maximal surface extraction problem which is then solved efficiently. Finally, the Euclidean approximation is achieved with the help of a novel factorization method for perspective cameras. Two examples are presented.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129984644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Digital watermarking and information embedding using dither modulation","authors":"Brian Chen, G. Wornell","doi":"10.1109/MMSP.1998.738946","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738946","url":null,"abstract":"A variety of related applications have emerged that require the design of systems for embedding one signal within another signal. We propose a new class of embedding methods called quantization index modulation (QIM) and develop an example of such a method called dither modulation in which the embedded information modulates the dither signal of a dithered quantizer. We also develop a framework within which one can analyze performance trade-offs among robustness, distortion, and embedding rate, and we show that QIM systems have considerable performance advantages over previously proposed spread-spectrum and low-bit modulation systems.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132321797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classification of audio events in broadcast news","authors":"Zhu Liu, Qian Huang","doi":"10.1109/MMSP.1998.738963","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738963","url":null,"abstract":"This paper describes an approach to discriminate news report from others such as commercials and music in broadcast news programs based on audio information. The reported work is part of the effort at AT&T to hierarchically segment broadcast news programs into semantically meaningful units at different levels of abstraction. At the coarse level, using the described approach we preprocess the audio data to pass only the news segments as input to a speaker identification system. To develop a lightweight preprocessing scheme for efficiency, we adopted a set of audio features that are simple to compute yet, based on our observation, statistically capture the intrinsic properties of the audio events to be classified. To improve the performance of the classifier, fuzzy membership functions associated with the features are introduced. Preliminary experimental results are reported which demonstrate the usefulness of the approach.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131722742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FIGMENT: a basis for interactive virtual mannequin services","authors":"J. N. Anderson, M. Jack","doi":"10.1109/MMSP.1998.738933","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738933","url":null,"abstract":"Until now, the possibility of a 'virtual mannequin' service for telepresence shopping systems has been unrealistic due to the number and complexity of calculations required for the modelling of physical clothing items. This paper presents an overview of the FIGMENT scheme (Fast Implementation Garment Modelling environmENT) which incorporates a four-point strategy (a simplified physical model, collision volume approximation, progressive meshes and a hybrid rendering algorithm) to reduce the quantity and complexity of the computations involved, bringing simulation times from the realm of hours to minutes and seconds whilst maintaining an acceptable level of accuracy and fidelity in the results.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123248759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Matrix quantization with vector quantization error compensation for robust speech recognition","authors":"L. Cong, S. Asghar","doi":"10.1109/MMSP.1998.738924","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738924","url":null,"abstract":"This paper proposes a robust, speaker-independent IWSR system which combines dual fuzzy matrix quantization (FMQ) and fuzzy vector quantization (FVQ) pairs, or dual MQ/VQ quantization pair with a discrete HMM to efficiently utilize processing resources and improve speech recognition performance. This system exploits the \"evolution\" of the speech short-term spectral envelopes with error compensation from FVQ/HMM, or VQ/HMM processes to target noise-affected input signal parameters and minimize noise influence. The enhanced processing technology employs a weighted LSP distance measure in the LBG algorithm. Computer simulation using gender-dependent HMMs clearly indicates the superiority over conventional FVQ/HMM and FMQ/HMM systems with 96.48% and 92.8% recognition accuracy at 20 dB and 5 dB SNR levels, respectively in a car noise environment, based on database TIDIGITS.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126389057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interactive object-based analysis and manipulation of digital video","authors":"P. E. Eren, N. Zhuang, Yue Fu, A. Tekalp","doi":"10.1109/MMSP.1998.738956","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738956","url":null,"abstract":"With the advent of MPEG-4, object-based natural/synthetic hybrid multimedia content is becoming more ubiquitous. In this paper, we address object-based interactive analysis of natural video for editing/authoring natural/synthetic hybrid content. Boundary and local motion of video objects are described by snake and 2-D mesh representations, respectively. The 2-D mesh modeling in effect performs a mapping of a natural video object into a computer graphics representation, namely geometry with motion and a texture map; thus allowing for easy editing of natural video objects using tools already developed in computer graphics. This paper presents the components of a tool designed for interactive video object analysis and editing using graphical models whose syntax and semantics conform with VRML and MPEG-4 standards. The analysis tool is developed using Java and C++, while rendering and editing are performed within VRML or MPEG-4 browsers and authoring tools. Demonstrations of the video analysis and editing are included.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"246 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123752399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}