{"title":"Eigenfaces and eigenvoices: dimensionality reduction for specialized pattern recognition","authors":"R. Kuhn, Patrick Nguyen, J. Junqua, L. Goldwasser","doi":"10.1109/MMSP.1998.738915","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738915","url":null,"abstract":"There are hidden analogies between two dissimilar research areas: face recognition and speech recognition. The standard representations for faces and voices misleadingly suggest that they have a high number of degrees of freedom. However, human faces have two eyes, a nose, and a mouth in predictable locations; such constraints ensure that possible images of faces occupy a tiny portion of the space of possible 2D images. Similarly, physical and cultural constraints on acoustic realizations of words uttered by a particular speaker imply that the true number of degrees of freedom for speaker-dependent hidden Markov models (HMMs) is quite small. Face recognition researchers have adopted representations that make explicit the underlying low dimensionality of the task, greatly improving the performance of their systems while reducing computational costs. We argue that speech researchers should use similar techniques to represent variation between speakers, and discuss applications to speaker adaptation, speaker identification and speaker verification.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129454255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic frame-skipping in video transcoding","authors":"Jenq-Neng Hwang, Tzong-Der Wu, Chia-Wen Lin","doi":"10.1109/MMSP.1998.739049","DOIUrl":"https://doi.org/10.1109/MMSP.1998.739049","url":null,"abstract":"This paper investigates the dynamic frame skipping strategy in video transcoding. To speed up the operation, a video transcoder usually reuses the decoded motion vectors to reencode the video sequences at a lower bit-rate. When frame skipping is allowed in a transcoder, those motion vectors can not be reused because the motion vectors of the current frame is no longer estimated from the immediate past frame. To reduce the computational complexity of motion vectors reestimation, a bilinear interpolation approach is developed to overcome this problem. Based on these interpolated motion vectors, the search range can be much reduced. Furthermore, we propose a frame rate control scheme which can dynamically adjust the number of skipped frames according to the accumulated magnitude of the motion vectors. As a result, the decoded sequence can present much smoother motion.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130993750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Order statistics preserving near-lossless image coding","authors":"Xiaolin Wu, Xuehong Li, Tong Qiu","doi":"10.1109/MMSP.1998.738974","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738974","url":null,"abstract":"We introduce a new concept of near-lossless image compression called order statistics preserving (OSP) near-lossless coding. Unlike ubiquitous L/sub 2/ (PSNR) and common near-lossless criterion of L/sub /spl infin// the OSP is a context-based fidelity measure that can meet more stringent requirements of high-end users in medical, space, and scientific communities.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133972829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Appadwedula, M. Goel, Douglas L. Jones, K. Ramchandran, Naresh R Shanbhag
{"title":"Efficient wireless image transmission under a total power constraint","authors":"S. Appadwedula, M. Goel, Douglas L. Jones, K. Ramchandran, Naresh R Shanbhag","doi":"10.1109/MMSP.1998.739042","DOIUrl":"https://doi.org/10.1109/MMSP.1998.739042","url":null,"abstract":"Due to high data rates and limited bandwidth as well as limited battery power, wireless multimedia communications systems must be optimized in every possible way. We develop a generic matching scheme for wireless image and video communication in which the three most significant components: the source coder, the channel coder, and hardware power consumption, are jointly optimized. That is, we maximize the end-to-end image quality subject to a total power constraint on both the RF transmission power and the power consumption of the digital implementation of the channel coder, which represents a major portion of the total hardware power in short-range applications.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"431 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132298419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A fast method of reconstructing high-resolution panoramic stills from MPEG-compressed video","authors":"Y. Altunbasak, Andrew J. Patti","doi":"10.1109/MMSP.1998.738919","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738919","url":null,"abstract":"Creating high quality still pictures from video presents a challenging problem due to the low spatial resolution of most video signals. Many algorithms have been proposed in the literature that utilize multiple video frames to increase spatial resolution. These algorithms depend on two critical assumptions: first, that the scene does not change significantly in the temporal vicinity of the frame of interest, and second that the motion estimation between video frames is extremely accurate. Noting that panoramic views are not only visually pleasing, but also fit the aforementioned assumptions, we propose the use of a scene change detection algorithm to locate scenes containing mainly pan/tilt types of motion. Since many digital video sequences are compressed using MPEG, it is desirable to perform all computations with minimal decompression. To this end, we also propose methods to locate pans from MPEG-compressed video. Once the pan segments are located, a number of highly accurate motion estimation methods can be successfully applied to the video segment. Given the resulting accurate motion, there exist various methods of attacking the resolution enhancement problem and creating a panoramic still image. These, for the most part, are computationally expensive. Therefore, we propose a fast method of obtaining enhanced resolution panoramas from the lower resolution video signal.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134343434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stereo sequence analysis, compression, and virtual viewpoint synthesis","authors":"Ru-Shang Wang, Yao Wang","doi":"10.1109/MMSP.1998.739001","DOIUrl":"https://doi.org/10.1109/MMSP.1998.739001","url":null,"abstract":"This paper considers the problem of structure and motion estimation in stereoscopic tele-conferencing type sequences and its application for stereo sequence compression and for intermediate view generation. Generally, this type of sequence consists of one or several foreground objects (head and shoulders) and a more or less static background. By extracting the foreground objects and investigating the relationship between right and left images, we can compress the image-pair by making use of the redundancy of the left and right image-pair. In the meantime, by using the estimated structure information, we ran generate virtual viewpoints between the left and right image-pair, which can be very helpful for tele-presence applications.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116351794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A watermarking with two signatures","authors":"Ju Han Kim, Won Don Lee, Jin Hyeong Park","doi":"10.1109/MMSP.1998.738968","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738968","url":null,"abstract":"A watermarking scheme is presented which embeds two different watermarks to the same frequency and extracts the marks with two different methods in the frequency domain. One of the two extraction methods needs a source image, whereas the other does not. Each extraction method uses a unique operation in extracting the watermark. The use of two watermark schemes is more effective in claiming rightful ownership. Furthermore, this watermarking is non-invertible and is robust against IBM attack as it obtains two different extraction results from the same image.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124935765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The virtual museum: an integrated text and image database","authors":"S. Santini, R. Jain, M. Corvi","doi":"10.1109/MMSP.1998.738943","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738943","url":null,"abstract":"We describe our \"virtual museum\" project: an union of image and text retrieval technologies that allows users to visit art collections on the Web. The virtual museum is composed of a series of rooms that the user can visit. The connections between the rooms are variable: passing from a room to another yields a result which is the outcome of a query, and depends on the query criterion that the user has selected. This \"variable topology\" adds interest to the museum visit since it allows the visitor, within certain limits, to customize his/her museum experience and to make it different every time.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115822126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Facial expression recognition using HMM with observation dependent transition matrix","authors":"N. Tsapatsoulis, Miltiades Leonidou, S. Kollias","doi":"10.1109/MMSP.1998.738918","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738918","url":null,"abstract":"An expression recognition technique is proposed based on the hidden Markov models (HMM) ability to deal with time sequential data and to provide time scale invariability as well as a learning capability. A feature vector sequence is used for this purpose, which relies on optical flow extraction, as well as directional filtering of the motion field. Segmentation and identification of important facial parts are preceding feature extraction. The HMM is enhanced with an observation dependent transition matrix, being able to cope with the dynamics of emotions and the severe complexity of expressions timing. Experimental results are included illustrating the effectiveness of this method.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129514442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Kin, Chunho Lee, W. Mangione-Smith, M. Potkonjak
{"title":"Hypermedia processors: design space exploration","authors":"J. Kin, Chunho Lee, W. Mangione-Smith, M. Potkonjak","doi":"10.1109/MMSP.1998.738954","DOIUrl":"https://doi.org/10.1109/MMSP.1998.738954","url":null,"abstract":"We present a framework for area optimal system design space exploration for hypermedia applications. We focus on a category of processors that are programmable yet optimized to a hypermedia application. The key components of the framework presented in this paper are a retargetable instruction-level parallelism compiler, instruction level simulators, a set of complete media applications written in a high level language and a media processor synthesis algorithm. The framework addresses the need for area optimal system design by exploiting the instruction-level parallelism found in media applications by compilers that target multiple-instruction-issue processors. Using the framework we conduct an extensive exploration of area optimal system design space for a hypermedia application. We found that there is enough ILP in the typical media and communication applications to achieve highly concurrent execution when throughput requirements are high. On the other hand, when throughput requirements are low, there is no need to use multiple-instruction-issue processors.","PeriodicalId":180426,"journal":{"name":"1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129569825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}