1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)最新文献

筛选
英文 中文
Manipulation of text documents in the modified Group 4 domain 修改后的Group 4域中的文本文档的操作
1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175) Pub Date : 1998-12-07 DOI: 10.1109/MMSP.1998.738987
Shulan Deng, S. Latifi, J. Kanai
{"title":"Manipulation of text documents in the modified Group 4 domain","authors":"Shulan Deng, S. Latifi, J. Kanai","doi":"10.1109/MMSP.1998.738987","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738987","url":null,"abstract":"This paper presents a novel approach to document image compression that is efficient in both compression and processing flexibility. By proper exploitation of the structural characteristics of compressed data, one may obtain high performance for image operations with low complexity. Based on CCITT Group 4, an improved coding scheme (MG4), which exploits the 2-dimensional correlation between scan lines, is developed. Then such operations as skew detection, skew correction and connected component extraction are investigated and implemented. These operations are shown to run faster in the compressed domain than traditional methods.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":" 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113950605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Real-time lip tracking and bimodal continuous speech recognition 实时唇形跟踪和双峰连续语音识别
1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175) Pub Date : 1998-12-07 DOI: 10.1109/MMSP.1998.738914
M. T. Chan, You Zhang, Thomas S. Huang
{"title":"Real-time lip tracking and bimodal continuous speech recognition","authors":"M. T. Chan, You Zhang, Thomas S. Huang","doi":"10.1109/MMSP.1998.738914","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738914","url":null,"abstract":"We investigate using a bimodal approach to speech recognition by incorporating additional visual features derived from lip movement of the speaker. A reference contour model is used to track the lip outline of the speaker. By using color, constraining the deformation in an affine subspace, and by incorporating an outlier rejection mechanism, our system is robust and runs in real time. To address the model initialization issue, a fast lip localization algorithm is also incorporated. A sample of continuous bimodal speech data based on a confined vocabulary (useful for our application area) was synchronously captured for training and testing. Using the hidden Markov modeling framework, we trained our bimodal context-dependent sub-word-based recognizer in a few different ways. The experiments show that the bimodal recognizer compares favorably to the acoustic-only counterpart. The results also indicate that it is advantageous to include first derivatives of the visual features. Furthermore, the 2-stream modeling scheme appears to be preferable to the 1-stream case for bimodal speech.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124786106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 65
Noise reduction algorithms employing an intelligent inference engine for multimedia applications 采用智能推理引擎的多媒体应用降噪算法
1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175) Pub Date : 1998-12-07 DOI: 10.1109/MMSP.1998.738923
A. Czyżewski, R. Królikowski
{"title":"Noise reduction algorithms employing an intelligent inference engine for multimedia applications","authors":"A. Czyżewski, R. Królikowski","doi":"10.1109/MMSP.1998.738923","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738923","url":null,"abstract":"Two approaches to noise reduction are presented, namely the spectral subtraction system and the perceptual coding algorithm allowing to diminish audible noise. Both systems are controlled by an intelligent inference engine based on fuzzy logic. An extension of perceptual coding applications was proposed and verified experimentally with regard to noise removal originally present in acoustic signals. The engineered intelligent systems for noise reduction are presented briefly.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128579046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Temporal scalability using P-pictures for low-latency applications 为低延迟应用程序使用p -图片的时间可伸缩性
1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175) Pub Date : 1998-12-07 DOI: 10.1109/MMSP.1998.739040
S. Wenger
{"title":"Temporal scalability using P-pictures for low-latency applications","authors":"S. Wenger","doi":"10.1109/MMSP.1998.739040","DOIUrl":"https://doi.org/10.1109/MMSP.1998.739040","url":null,"abstract":"Many of the newer video coding/compression standards, including MPEG-4 and H.263+ support some form of temporal scalability as part of their layered codec concept. This is generally realized by using bi-directionally predicted pictures (B-pictures), which use both an earlier and a subsequent P-picture as reference (anchor) pictures. This paper introduces a new form of temporal scalability employing only P-pictures. This mechanism results in improved real-time behavior (particularly lower latency) and a more flexible layering structure, at the cost of less efficient coding. The P-picture based temporal scalability mechanism is particularly useful for interactive multimedia communication on networks that offer several independent transport streams (possibly with different quality of service), but have sub-optimal real-time characteristics. Typical applications include both the Internet and some forms of mobile communication. The 1998 version of H.263 (known as H.263+ in both academia and industry) offers a mechanism to support P-picture scalability within the bit-stream through the reference picture selection mode. Other video coding standards, including those of the MPEG family, require slight modifications to the decoder in order to support the proposed mechanism.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127291785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Building human face models from two images 从两幅图像中建立人脸模型
1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175) Pub Date : 1998-12-07 DOI: 10.1109/MMSP.1998.738922
Qian Chen, G. Medioni
{"title":"Building human face models from two images","authors":"Qian Chen, G. Medioni","doi":"10.1109/MMSP.1998.738922","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738922","url":null,"abstract":"We present a practical technique for building 3-D human face models from two photographs. Rather than using expensive 3-D scanners, we show that frontal face models can be faithfully reconstructed with unsophisticated digital cameras in a totally non-invasive setup. We propose a rectification algorithm based on the fundamental matrix by computing the dual of the point transformation matrix. The image matching problem is converted into a maximal surface extraction problem which is then solved efficiently. Finally, the Euclidean approximation is achieved with the help of a novel factorization method for perspective cameras. Two examples are presented.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129984644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Digital watermarking and information embedding using dither modulation 基于抖动调制的数字水印与信息嵌入
1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175) Pub Date : 1998-12-07 DOI: 10.1109/MMSP.1998.738946
Brian Chen, G. Wornell
{"title":"Digital watermarking and information embedding using dither modulation","authors":"Brian Chen, G. Wornell","doi":"10.1109/MMSP.1998.738946","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738946","url":null,"abstract":"A variety of related applications have emerged that require the design of systems for embedding one signal within another signal. We propose a new class of embedding methods called quantization index modulation (QIM) and develop an example of such a method called dither modulation in which the embedded information modulates the dither signal of a dithered quantizer. We also develop a framework within which one can analyze performance trade-offs among robustness, distortion, and embedding rate, and we show that QIM systems have considerable performance advantages over previously proposed spread-spectrum and low-bit modulation systems.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132321797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 205
Classification of audio events in broadcast news 广播新闻中音频事件的分类
1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175) Pub Date : 1998-12-07 DOI: 10.1109/MMSP.1998.738963
Zhu Liu, Qian Huang
{"title":"Classification of audio events in broadcast news","authors":"Zhu Liu, Qian Huang","doi":"10.1109/MMSP.1998.738963","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738963","url":null,"abstract":"This paper describes an approach to discriminate news report from others such as commercials and music in broadcast news programs based on audio information. The reported work is part of the effort at AT&T to hierarchically segment broadcast news programs into semantically meaningful units at different levels of abstraction. At the coarse level, using the described approach we preprocess the audio data to pass only the news segments as input to a speaker identification system. To develop a lightweight preprocessing scheme for efficiency, we adopted a set of audio features that are simple to compute yet, based on our observation, statistically capture the intrinsic properties of the audio events to be classified. To improve the performance of the classifier, fuzzy membership functions associated with the features are introduced. Preliminary experimental results are reported which demonstrate the usefulness of the approach.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131722742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
FIGMENT: a basis for interactive virtual mannequin services FIGMENT:交互式虚拟模型服务的基础
1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175) Pub Date : 1998-12-07 DOI: 10.1109/MMSP.1998.738933
J. N. Anderson, M. Jack
{"title":"FIGMENT: a basis for interactive virtual mannequin services","authors":"J. N. Anderson, M. Jack","doi":"10.1109/MMSP.1998.738933","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738933","url":null,"abstract":"Until now, the possibility of a 'virtual mannequin' service for telepresence shopping systems has been unrealistic due to the number and complexity of calculations required for the modelling of physical clothing items. This paper presents an overview of the FIGMENT scheme (Fast Implementation Garment Modelling environmENT) which incorporates a four-point strategy (a simplified physical model, collision volume approximation, progressive meshes and a hybrid rendering algorithm) to reduce the quantity and complexity of the computations involved, bringing simulation times from the realm of hours to minutes and seconds whilst maintaining an acceptable level of accuracy and fidelity in the results.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123248759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Matrix quantization with vector quantization error compensation for robust speech recognition 矩阵量化与矢量量化误差补偿的鲁棒语音识别
1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175) Pub Date : 1998-12-07 DOI: 10.1109/MMSP.1998.738924
L. Cong, S. Asghar
{"title":"Matrix quantization with vector quantization error compensation for robust speech recognition","authors":"L. Cong, S. Asghar","doi":"10.1109/MMSP.1998.738924","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738924","url":null,"abstract":"This paper proposes a robust, speaker-independent IWSR system which combines dual fuzzy matrix quantization (FMQ) and fuzzy vector quantization (FVQ) pairs, or dual MQ/VQ quantization pair with a discrete HMM to efficiently utilize processing resources and improve speech recognition performance. This system exploits the \"evolution\" of the speech short-term spectral envelopes with error compensation from FVQ/HMM, or VQ/HMM processes to target noise-affected input signal parameters and minimize noise influence. The enhanced processing technology employs a weighted LSP distance measure in the LBG algorithm. Computer simulation using gender-dependent HMMs clearly indicates the superiority over conventional FVQ/HMM and FMQ/HMM systems with 96.48% and 92.8% recognition accuracy at 20 dB and 5 dB SNR levels, respectively in a car noise environment, based on database TIDIGITS.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126389057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Interactive object-based analysis and manipulation of digital video 交互式基于对象的数字视频分析与处理
1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175) Pub Date : 1998-12-07 DOI: 10.1109/MMSP.1998.738956
P. E. Eren, N. Zhuang, Yue Fu, A. Tekalp
{"title":"Interactive object-based analysis and manipulation of digital video","authors":"P. E. Eren, N. Zhuang, Yue Fu, A. Tekalp","doi":"10.1109/MMSP.1998.738956","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738956","url":null,"abstract":"With the advent of MPEG-4, object-based natural/synthetic hybrid multimedia content is becoming more ubiquitous. In this paper, we address object-based interactive analysis of natural video for editing/authoring natural/synthetic hybrid content. Boundary and local motion of video objects are described by snake and 2-D mesh representations, respectively. The 2-D mesh modeling in effect performs a mapping of a natural video object into a computer graphics representation, namely geometry with motion and a texture map; thus allowing for easy editing of natural video objects using tools already developed in computer graphics. This paper presents the components of a tool designed for interactive video object analysis and editing using graphical models whose syntax and semantics conform with VRML and MPEG-4 standards. The analysis tool is developed using Java and C++, while rendering and editing are performed within VRML or MPEG-4 browsers and authoring tools. Demonstrations of the video analysis and editing are included.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"246 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123752399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信