{"title":"An Edge-Preserving Super-Precision for Simultaneous Enhancement of Spacial and Grayscale Resolutions","authors":"H. Hasegawa, T. Ohtsuka, I. Yamada, K. Sakaniwa","doi":"10.1093/ietfec/e91-a.2.673","DOIUrl":"https://doi.org/10.1093/ietfec/e91-a.2.673","url":null,"abstract":"In this paper, we propose a method that recovers a smooth high-resolution image from several blurred and roughly quantized low-resolution images. For compensation of the quantization effect we introduce a measurement of smoothness originally used for suppression of block noises in a JPEG compressed image [Schultz & Stevenson '94]. With a simple operator that approximates to the convex projection onto constraint set defined for each quantized image [Hasegawa et al. '05], we propose a method that minimizes these cost functions, which are smooth convex functions, over the intersection of all constraint sets, i.e. the set of all images satisfying all quantization constraints simultaneously, by using hybrid steepest descent method [Yamada & Ogura '04]. Finally in the numerical example we compare images derived by the proposed method, POCS based conventional method, and generalized proposed method minimizing smoothed total variation and energy of output of Laplacian","PeriodicalId":267577,"journal":{"name":"2006 IEEE Workshop on Multimedia Signal Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126128395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Bottom-Up Image Segmentation Method Based on Region Growing, Region Competition and the Mumford Shah Functional","authors":"Yongsheng Pan, J. Birdwell, S. Djouadi","doi":"10.1109/MMSP.2006.285327","DOIUrl":"https://doi.org/10.1109/MMSP.2006.285327","url":null,"abstract":"Curve evolution implementations of the Mumford-Shah functional are of broad interest in image segmentation. These implementations, however, have initialization problems. A mathematical analysis of the initialization problem for the bi-modal Chan-Vese model is provided in this paper. The initialization problem is a result of the non-convexity of the Mumford-Shah functional and the top-down hierarchy of the model's use of global region information in the image. An efficient image segmentation method is proposed that alleviates the initialization problem, based on region growing, region competition and the Mumford Shah functional. This algorithm is able to automatically and efficiently segment objects in complicated images. Using a bottom-up hierarchy, the method avoids the initialization problem in the Chan-Vese model and works for images with multiple junctions and color images. It can be extended to textured images. Experimental results show that the proposed method is robust to the effects of noise","PeriodicalId":267577,"journal":{"name":"2006 IEEE Workshop on Multimedia Signal Processing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128489500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring the Virtual Reed Parameter Space Using Haptic Feedback","authors":"T. Smyth, Thomas N. Smyth, A. Kirkpatrick","doi":"10.1109/MMSP.2006.285266","DOIUrl":"https://doi.org/10.1109/MMSP.2006.285266","url":null,"abstract":"A high quality computer synthesis of an acoustic sound source does not necessarily yield a playable virtual musical instrument. A computer simulation of an acoustic musical instrument creates a disconnect between sound production and user input, and correspondingly, between hearing and feeling, in contrast to their interconnection in an acoustic instrument. This disconnect denies the user important haptic clues well known to help instrument control, impeding the user's ability to find, and remain inside, regions of playability. This research explores the addition of haptic feedback to a virtual reed model. In particular, we render the instrument's parameter space as a dynamic force field in order to support fine motor movements and, in turn, provide the user with cues regarding the instrument's oscillatory state and possible regions of playability. We then observe the effects that this additional feedback has on the user's ability to play the virtual instrument","PeriodicalId":267577,"journal":{"name":"2006 IEEE Workshop on Multimedia Signal Processing","volume":"209 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124692091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Compressed XML Schema Representation for Metadata Processing in Mobile Environments","authors":"Jianjun Fang, A. Martinez-Smith, B. Gandhi","doi":"10.1109/MMSP.2006.285358","DOIUrl":"https://doi.org/10.1109/MMSP.2006.285358","url":null,"abstract":"Efficient packaging and communication of metadata are critical in multimedia communications to achieve seamless mobility. The XML schema compression proposed here comprises a method for decomposing an XML schema into a sequence of atomic elements. This representation reorganizes the given XML schema with the threefold purpose of facilitating dynamic schema switching and reconfiguration of metadata decoders, increasing the efficiency of binary metadata decoding, and storing XML schemas in binary format. As a result, schemas can be efficiently transmitted over congested networks and stored in devices with limited resources. Since each schema may represent the metadata structure for a different type of media format, this allows a system to support different profiles (schemas) and media formats, enabling dynamically reconfigurable metadata encoders and decoders","PeriodicalId":267577,"journal":{"name":"2006 IEEE Workshop on Multimedia Signal Processing","volume":"76 17","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120887057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sangho Kim, Sungtak Kim, Suk-bong Kwon, Hoirin Kim
{"title":"A Music Summarization Scheme using Tempo Tracking and Two Stage Clustering","authors":"Sangho Kim, Sungtak Kim, Suk-bong Kwon, Hoirin Kim","doi":"10.1109/MMSP.2006.285302","DOIUrl":"https://doi.org/10.1109/MMSP.2006.285302","url":null,"abstract":"In this paper, we present effective methods for music summarization which automatically extract a representative portion of the music by signal processing technology. Our proposed method uses 2-dimensional similarity matrix, tempo tracking, and clustering techniques to extract several segments which have different moods or dissimilar semantic structure in the music. The segments extracted are combined to generate a complete music summary. The three main techniques used in this paper are well-known and widely used for extracting music summary. However, we use them in a different way, and experiments show the proposed method captures the main theme of the music more effectively than conventional methods. The experimental results also show that one of the proposed methods could be used for real-time application since the processing time in generating music summary is much faster than other methods","PeriodicalId":267577,"journal":{"name":"2006 IEEE Workshop on Multimedia Signal Processing","volume":"52 24","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120925334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Shadow Removal via Flash/Noflash Illumination","authors":"Cheng Lu, M. S. Drew, G. Finlayson","doi":"10.1109/MMSP.2006.285296","DOIUrl":"https://doi.org/10.1109/MMSP.2006.285296","url":null,"abstract":"Shadows bedevil multimedia applications, e.g. seeing into shadow regions in surveillance video. Model-based and non-model based, statistical methods, spatial, temporal, and invariant-based methods have been devised for combatting the shadow problem. Here we take a different approach, by attenuating the shadow by utilizing a second image under another illuminant to remove the effect of shadow-edges from an edge map of each frame. As a precursor step, we examine flash/noflash still image pairs. A flash image provides lessened shadows, but other shadows are produced. We can produce a flash-only (no ambient) image by subtracting the two images, but several artifacts remain. Instead, we have used the pure-flash image to detect the ambient shadows [ICME06]. However that method may fail when there are flash-shadows or specularities in the copy region. Here we manipulate the gradient field using a smoothing step including the directionality of edges near the shadow boundary, with improved results","PeriodicalId":267577,"journal":{"name":"2006 IEEE Workshop on Multimedia Signal Processing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132544170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fault-Tolerant Music Search by New Ranking Order Algorithm","authors":"W. Theimer, Andree Ross","doi":"10.1109/MMSP.2006.285303","DOIUrl":"https://doi.org/10.1109/MMSP.2006.285303","url":null,"abstract":"Music information retrieval is an active area of research with high practical relevance. Humming a melody is one natural way to overcome the input restrictions when searching for music. In this presentation we concentrate on a new ranking order algorithm to match melody input to a music database containing polyphonic music as sequences of notes. The new algorithm achieves high melody recognition rates and shows graceful degradation in the presence of errors such as omission/insertion of notes or wrong tone heights/durations. The recognition rate is maximized by applying evolutionary strategies for parameter optimizations. The parallel implementation on a Linux-based PC cluster and the computational effort are discussed. Quantitative results are presented for melody recognition. We compare the method with related work and conclude with an outlook","PeriodicalId":267577,"journal":{"name":"2006 IEEE Workshop on Multimedia Signal Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134153889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint Data Partition and Rate-Distortion Optimized Mode Selection for H.264 Error-Resilient Coding","authors":"Yuan Zhang, Wen Gao, Debin Zhao","doi":"10.1109/MMSP.2006.285307","DOIUrl":"https://doi.org/10.1109/MMSP.2006.285307","url":null,"abstract":"Data partitioning (DP) is an efficient error-resilient video coding tool. Its contribution to performance improvement in the error-prone environment arises from the superior error concealment mechanisms that are available with the help of protected data partitions. Since error-concealment in terms of DP is closely related to coding mode, it is desirable to have an optimized coding mode selection scheme. However, the existing coding mode selection techniques usually assume that the same error-concealment mechanism is used for a block when it is lost, and the associated distortion also remains the same. Obviously, this assumption is not true when DP involves. In this paper, a generalized end-to-end distortion model is proposed for the rate-distortion optimized coding mode selection, which fully utilizes the superior error-concealment mechanism in terms of DP. The proposed distortion model is also advantageous in the suppression of approximation errors caused by pixel average operations such as sub-pixel interpolation and deblocking filter. Therefore, it can lead to a low-complexity solution for real-time applications such as live streaming","PeriodicalId":267577,"journal":{"name":"2006 IEEE Workshop on Multimedia Signal Processing","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115240556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Music Genres Classification using Text Categorization Method","authors":"Kai Chen, Sheng Gao, Yongwei Zhu, Qibin Sun","doi":"10.1109/MMSP.2006.285301","DOIUrl":"https://doi.org/10.1109/MMSP.2006.285301","url":null,"abstract":"Automatic music genre classification is one of the most challenging problems in music information retrieval and management of digital music database. In this paper, we propose a new framework using text category methods to classify music genres. This framework is different from current methods for music genre classification. In our framework, we consider music as text-like semantic music document, which is represented by a set of music symbol lexicons with a HMM (hidden Markov models) cluster. Music symbols can be seemed as high-level features or semantic features like beats or rhythms. We use latent semantic indexing (LSI) technique that is widely adopted in text categorization for music genre classification. From the experimental results, we could achieve an average recall over 70% for ten musical genres","PeriodicalId":267577,"journal":{"name":"2006 IEEE Workshop on Multimedia Signal Processing","volume":"656 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116485610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cha Zhang, Pei Yin, Y. Rui, Ross Cutler, Paul A. Viola
{"title":"Boosting-Based Multimodal Speaker Detection for Distributed Meetings","authors":"Cha Zhang, Pei Yin, Y. Rui, Ross Cutler, Paul A. Viola","doi":"10.1109/MMSP.2006.285274","DOIUrl":"https://doi.org/10.1109/MMSP.2006.285274","url":null,"abstract":"Speaker detection is a very important task in distributed meeting applications. This paper discusses a number of challenges we met while designing a speaker detector for the Microsoft RoundTable distributed meeting device, and proposes a boosting-based multimodal speaker detection (BMSD) algorithm. Instead of performing sound source localization (SSL) and multi-person detection (MPD) separately and subsequently fusing their individual results, the proposed algorithm uses boosting to select features from a combined pool of both audio and visual features simultaneously. The result is a very accurate speaker detector with extremely high efficiency. The algorithm reduces the error rate of SSL-only approach by 47%, and the SSL and MPD fusion approach by 27%","PeriodicalId":267577,"journal":{"name":"2006 IEEE Workshop on Multimedia Signal Processing","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127352353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}