{"title":"Learning Indirect Acquisition of Instrumental Gestures using Direct Sensors","authors":"G. Tzanetakis, A. Kapur, A. Tindale","doi":"10.1109/MMSP.2006.285264","DOIUrl":"https://doi.org/10.1109/MMSP.2006.285264","url":null,"abstract":"Sensing instrumental gestures is a common task in interactive electroacoustic music performances. The sensed gestures can then be mapped to sounds, synthesis algorithms, visuals etc. Two of the most common approaches for acquiring these gestures are: 1) Hybrid instruments which are \"traditional\" musical instruments enhanced with sensors that directly detect gestures 2) Indirect acquisition in which the only measurement is the acoustic signal and signal processing techniques are used to acquire the gestures. Hybrid instruments require modification of existing instruments which is frequently undesirable. However they provide relatively straightforward and reliable measuring capability. On the other hand, indirect acquisition approaches typically require sophisticated signal processing and possibly machine learning algorithms in order to extract the relevant information from the audio signals. In this paper the idea of using direct sensors to train a machine learning model for indirect acquisition is explored. This approach has some nice advantages, mainly: 1) large amounts of training data can be collected with minimum effort 2) once the indirect acquisition system is trained no sensors or modifications to the playing instrument are required. Case studies described in paper include 1) strike position on a snare drum 2) strum direction on a sitar","PeriodicalId":267577,"journal":{"name":"2006 IEEE Workshop on Multimedia Signal Processing","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130099713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TwinFaces: Seamless Textures for Rendering Head Models","authors":"S. Ferradal, J. Gómez","doi":"10.1109/MMSP.2006.285292","DOIUrl":"https://doi.org/10.1109/MMSP.2006.285292","url":null,"abstract":"TwinFaces, a new technique for the automatic generation of facial textures is presented in this paper. Multiple views from the individual, taken under moderately controlled illumination conditions, are combined to create the seamless texture using a wavelet-based technique. Since the method relies on a set of particular feature points of the MPEG-4 facial and body animation (FBA) standard, it can be easily integrated with applications using the same standard. A post-processing technique for pictures taken outdoors has been also integrated into TwinFaces to compensate for non-uniform illumination conditions. The wavelet-based approach has shown to be more flexible than previous approaches to remove the artificial seam lines produced by the blending of the different views involved in the texture generation","PeriodicalId":267577,"journal":{"name":"2006 IEEE Workshop on Multimedia Signal Processing","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125136094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Index Assignment for MDSQ Encoder Over Noisy Channels","authors":"Rui Ma, F. Labeau","doi":"10.1109/MMSP.2006.285315","DOIUrl":"https://doi.org/10.1109/MMSP.2006.285315","url":null,"abstract":"Multiple description coding (MDC) can achieve acceptable performance with only one single correct description, and lower distortion of the source when more descriptions are received correctly. In our previous work, we assumed that because of noisy channel, bit errors were introduced into one description, but the other one was received correctly. Based on this assumption, an enhanced central decoder has been proposed to utilize the residual information between the corrupted and the correct description to reduce the distortion of the reconstructed signals. In this paper, a novel index assignment algorithm for multiple description scalar quantizer (MDSQ) encoder is developed to improve the error detection capability of the central decoder. Genetic algorithm (GA) is used to search for a suboptimal solution. The experimental results show that within a range of bit error rates (BERs), the proposed algorithm provides lower reconstruction distortion than the conventional MDSQ","PeriodicalId":267577,"journal":{"name":"2006 IEEE Workshop on Multimedia Signal Processing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125618218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiajun Bu, Linjian Mo, Genfu Shao, Zhi Yang, Chun Chen
{"title":"A Novel Fast Error-resilient Video Coding Scheme for H.264","authors":"Jiajun Bu, Linjian Mo, Genfu Shao, Zhi Yang, Chun Chen","doi":"10.1109/MMSP.2006.285308","DOIUrl":"https://doi.org/10.1109/MMSP.2006.285308","url":null,"abstract":"Both perceptually and statistically, compressed video with large or disordered motion is sensitive to errors. In this paper, we propose a novel fast error-resilient video coding scheme, which is based on significant macroblock (MB) determination and protection. The scheme takes three impact factors (inter-block mode, motion vector difference and SAD value) to build a statistical model. The model takes error concealment (EC) into consideration in advance and generates several parameters for further significant degree (SD) evaluation for MBs. During encoding, we build an SD table for each frame based on the parameters and pick up those MBs with the largest SD values as significant MBs (SMBs). Few additional computations are induced into SMB determination, thus make our scheme practical in real time video coding scenarios. Simulations show that the scheme has an acceptable SMB determination accuracy and the corresponding protection method can prevent errors effectively","PeriodicalId":267577,"journal":{"name":"2006 IEEE Workshop on Multimedia Signal Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116973400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Thornburg, Dilip Swaminathan, T. Ingalls, R. Leistikow
{"title":"Joint Segmentation and Temporal Structure Inference for Partially-Observed Event Sequences","authors":"H. Thornburg, Dilip Swaminathan, T. Ingalls, R. Leistikow","doi":"10.1109/MMSP.2006.285265","DOIUrl":"https://doi.org/10.1109/MMSP.2006.285265","url":null,"abstract":"Many events of interest in human activity-based multimedia applications exhibit a high degree of temporal structure. This structure generates expectancies regarding the occurrence and location of subsequent events. In the context of switching state-space models, we develop a general Bayesian framework for representing temporal expectancies and fusing them with raw sense-data to improve both event segmentation and temporal structure identification. Furthermore, we develop a new cognitive model for event anticipation which adapts to incoming sense-data in real time. Comparative advantages of the proposed framework are realized in controlled experiments involving partially-observed, quasi-periodic event streams","PeriodicalId":267577,"journal":{"name":"2006 IEEE Workshop on Multimedia Signal Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122100538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Global Geometric Distortion Correction in Images","authors":"M. Awrangjeb, Manzur Murshed, Guojun Lu","doi":"10.1109/MMSP.2006.285346","DOIUrl":"https://doi.org/10.1109/MMSP.2006.285346","url":null,"abstract":"The performance of existing copyright protection schemes is questionable due to their vulnerability to geometric transformations. Though a few of them can resist global geometric transformations like rotation and scaling attacks, most of them are vulnerable to rotation-scale and cropping attacks. This paper presents a novel geometric distortion correction scheme robust to global geometric transformations. It restores an attacked image to its approximate original by reversing the attack using the invariant centroid and geometric moments of the image. Experimental results show the effectiveness of the proposed scheme","PeriodicalId":267577,"journal":{"name":"2006 IEEE Workshop on Multimedia Signal Processing","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122430528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatio-Bi-Temporal Error Concealment in Block-Based Video Decoding Systems","authors":"M. Friebe, André Kaup","doi":"10.1109/MMSP.2006.285317","DOIUrl":"https://doi.org/10.1109/MMSP.2006.285317","url":null,"abstract":"In this paper we present a spatio-bi-temporal fading scheme for block loss recovery in block-based video decoding systems. In the first part of the algorithm, based on two different boundary error criterions obtained from bi-temporal error concealment, either the previous, the future, or fading between both temporal methods is used for bi-temporal macroblock estimation. A weighted absolute difference between motion compensated image samples and macroblock boundary samples of the current frame represents one boundary error. In the second part of the algorithm, based on a boundary error criterion obtained from bi-temporal concealment, spatial, bi-temporal, or fading between both methods is used for recovering a lost macroblock. The advantage of this method is that one lost macroblock can be recovered pelwise spatially from the current or bi-temporally from the previous and the future frame by weighted averaging both error concealment results. The simulation results have shown that for recovering a lost macroblock this method outperforms the reference methods both in subjective and objective video quality","PeriodicalId":267577,"journal":{"name":"2006 IEEE Workshop on Multimedia Signal Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125713752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical Models for Activity Recognition","authors":"A. Subramanya, A. Raj, J. Bilmes, D. Fox","doi":"10.1109/MMSP.2006.285304","DOIUrl":"https://doi.org/10.1109/MMSP.2006.285304","url":null,"abstract":"In this paper we propose a hierarchical dynamic Bayesian network to jointly recognize the activity and environment of a person. The hierarchical nature of the model allows us to implicitly learn data driven decompositions of complex activities into simpler sub-activities. We show by means of our experiments that the hierarchical nature of the model is able to better explain the observed data thus leading to better performance. We also show that joint estimation of both activity and environment of a person outperforms systems in which they are estimated alone. The proposed model yields about 10% absolute improvement in accuracy over existing systems","PeriodicalId":267577,"journal":{"name":"2006 IEEE Workshop on Multimedia Signal Processing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126528124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bandwidth-Efficient Mixed Pseudo Analogue-Digital Speech and Audio Transmission","authors":"Carsten Hoelper, P. Vary","doi":"10.1109/MMSP.2006.285285","DOIUrl":"https://doi.org/10.1109/MMSP.2006.285285","url":null,"abstract":"Today's speech and audio coding and transmission systems are either analogue or digital, with a strong shift from analogue systems to digital systems during the last decades. In this paper, both digital and analogue schemes are combined for the benefit of saving transmission bandwidth, complexity, and of improving the achievable quality at any given signal-to-noise ratio (SNR) on the channel. The combination is achieved by transmitting pseudo analogue samples of the unquantized residual signal of a linear predictive digital filter. The new system, mixed pseudo analogue-digital (MAD) transmission, is applied to narrowband speech as well as to wideband speech and audio. MAD transmission over a channel modeled by additive white Gaussian noise (AWGN) is compared to the GSM adaptive multi-rate speech codec mode 12.2 kbit/s (enhanced full-rate codec), which uses a comparable transmission bandwidth if channel coding is included and to PCM transmission in the case of audio signals","PeriodicalId":267577,"journal":{"name":"2006 IEEE Workshop on Multimedia Signal Processing","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127880655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiong Liu, P. McEvoy, Don Kimber, Patrick Chiu, Hanning Zhou
{"title":"On Redirecting Documents with a Mobile Camera","authors":"Qiong Liu, P. McEvoy, Don Kimber, Patrick Chiu, Hanning Zhou","doi":"10.1109/MMSP.2006.285352","DOIUrl":"https://doi.org/10.1109/MMSP.2006.285352","url":null,"abstract":"This paper presents a method for facilitating document redirection in a physical environment via a mobile camera. With this method, a user is able to move documents among electronic devices, post a paper document to a selected public display, or make a printout of a white board with simple point-and-capture operations. More specifically, the user can move a document from its source to a destination by capturing a source image and a destination image in a consecutive order. The system uses SIFT (scale invariant feature transform) features of captured images to identify the devices a user is pointing to, and issues corresponding commands associated with the identified devices. Unlike RF/IR based remote controls, this method uses the visual features of an object as an always available identifier for many tasks, and therefore is easy to deploy. We present experiments on identifying three public displays and a document scanner in a conference room for evaluation","PeriodicalId":267577,"journal":{"name":"2006 IEEE Workshop on Multimedia Signal Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132353190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}