{"title":"Quality enhancement of procam system by radiometric compensation","authors":"Tai-Hsiang Huang, Chen-Tai Kao, Homer H. Chen","doi":"10.1109/MMSP.2012.6343439","DOIUrl":"https://doi.org/10.1109/MMSP.2012.6343439","url":null,"abstract":"A procam system consists of a projector and a camera. In this paper, a radiometric compensation scheme is proposed for improving the projection quality of a procam system that uses a nearby non-white wall as the projection screen. The compensation scheme is capable of correcting the radiometric errors such as chroma and brightness distortions caused by the projection surface and the ambient light. It is also capable of compensating the effects of nonlinear spectral response (including vignetting) of the procam system. These important functions are achieved through a computational framework that optimizes the tradeoff between chroma and brightness distortions of the projected image and accelerates the computation of the penalty in the optimization process. Experimental results are shown to demonstrate the performance of the proposed radiometric compensation scheme.","PeriodicalId":325274,"journal":{"name":"2012 IEEE 14th International Workshop on Multimedia Signal Processing (MMSP)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115393771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Martin Köppel, Xi Wang, D. Doshkov, T. Wiegand, P. Ndjiki-Nya
{"title":"Consistent spatio-temporal filling of disocclusions in the multiview-video-plus-depth format","authors":"Martin Köppel, Xi Wang, D. Doshkov, T. Wiegand, P. Ndjiki-Nya","doi":"10.1109/MMSP.2012.6343410","DOIUrl":"https://doi.org/10.1109/MMSP.2012.6343410","url":null,"abstract":"Depth image-based rendering (DIBR) techniques allow for a wide variety of 3-D applications, including synthesizing additional virtual views in a multiview-video-plus-depth (MVD) representation. The MVD format consists of scene texture and depth information for a limited number of original views of the same scene. One of the main obstacles in the DIBR technique lies in the disocclusion problem which results from the fact that a scene can only be observed from a set of original views. This can lead to missing information in the generated virtual views, especially in extrapolation scenarios. Our work describes a novel algorithm that synthesizes such disoccluded textures. The proposed synthesizer enhances the visual experience by taking spatial and temporal video information into account. In order to compensate for global motion in sequences, image registration is incorporated into the framework. Objective and subjective gains are shown compared to three state-of-the-art approaches.","PeriodicalId":325274,"journal":{"name":"2012 IEEE 14th International Workshop on Multimedia Signal Processing (MMSP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122128159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Affect recognition using EEG signal","authors":"Haiyan Xu, K. Plataniotis","doi":"10.1109/MMSP.2012.6343458","DOIUrl":"https://doi.org/10.1109/MMSP.2012.6343458","url":null,"abstract":"Emotion states greatly influence many areas in our daily lives, such as: learning, decision making and interaction with others. Therefore, the ability to detect and recognize one's emotional states is essential in intelligence Human Machine Interaction (HMI). The aim of this study was to develop a new system that can sense and communicate emotion changes expressed by the Central Nervous System (CNS) through the use of EEG signals. More specifically, this study was carried out to develop an EEG-based subject-dependent affect recognition system to quantitatively measure and categorize three affect states: Positively excited, neutral and negatively excited. In this paper, we discussed implementation issues associated with each key stage of a fully automated affect recognition system: emotion elicitation protocol, feature extraction and classification. EEG recordings from 5 subjects with IAPS images as stimuli from the eNTERFACE06 database were used for simulation purposes. Discriminating features were extracted in both time and frequency domains (statistical, narrow-band, HOC, and wavelet entropy) to better understand the oscillatory nature of the brain waves. Through the use of k Nearest Neighbor classifier (kNN), we obtained mean correct classification rates of 90.77% on the three emotion classes when K equals 5. This demonstrated the feasibility of brain waves as a mean to categorize a user's emotion state. Secondly, we also assessed the suitability of commercially available EEG headsets such as Emotive Epoc for emotion recognition applications. This study was carried out by comparing the sensor location, signal integrity with those of Biosemi Active II. A new set of recognition performance was presented with reduced number of channels.","PeriodicalId":325274,"journal":{"name":"2012 IEEE 14th International Workshop on Multimedia Signal Processing (MMSP)","volume":"171 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121253717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Features for comparing tune similarity of songs across different languages","authors":"Naveen Kumar, A. Tsiartas, Shrikanth S. Narayanan","doi":"10.1109/MMSP.2012.6343464","DOIUrl":"https://doi.org/10.1109/MMSP.2012.6343464","url":null,"abstract":"Finding tunes that are similar across languages and cultures offers new ways to study global musical influences and similarities. From a signal processing point of view, we find that the availability of vocal music tracks provides us a means for computing tune similarity even in the presence of language differences. While the different acoustic characteristics of each language add to the inherent ambiguity in these kind of problems, the guarantee that a vocal track exists can be a boon in disguise. For this purpose we use the Multi Band Autocorrelation Peak (MBAP) features, extracted in multiple bands providing complementary information which helps to improve the accuracy. Results obtained on a classification task suggest that these features can outperform traditional features like Chroma which capture information from the entire spectrum. Alignment cost using the dynamic time warping algorithm was used a classification metric on a dataset of songs obtained from Youtube.","PeriodicalId":325274,"journal":{"name":"2012 IEEE 14th International Workshop on Multimedia Signal Processing (MMSP)","volume":"2013 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121358928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tzu-Hsuan Chiu, Guan-Long Wu, Yu-Chuan Su, Winston H. Hsu
{"title":"Sharing the trees among random forests for effective and efficient concept detection","authors":"Tzu-Hsuan Chiu, Guan-Long Wu, Yu-Chuan Su, Winston H. Hsu","doi":"10.1109/MMSP.2012.6343445","DOIUrl":"https://doi.org/10.1109/MMSP.2012.6343445","url":null,"abstract":"In this paper, we focus on the random forest based concept detection system, and we intend to improve the efficiency of the system in testing phase and to save memory and storage usages by reducing the total number of trees (classifiers). However, reducing the tree number often results in poor performance. In this article, we proposed a method called tree-sharing to cope with this issue. Unlike the traditional method that treats each concept independently, our work shares the trees among concepts, and leave the most important ones from the view of whole system. Experiments on different concept sets show tree-sharing can greatly reduce the number of total trees while the performance decreases slightly. Even in the worst case, we achieve 80% of original performance with only 5% of trees.","PeriodicalId":325274,"journal":{"name":"2012 IEEE 14th International Workshop on Multimedia Signal Processing (MMSP)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128972625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Silovský, J. Zdánský, J. Nouza, P. Cerva, J. Prazak
{"title":"Incorporation of the ASR output in speaker segmentation and clustering within the task of speaker diarization of broadcast streams","authors":"J. Silovský, J. Zdánský, J. Nouza, P. Cerva, J. Prazak","doi":"10.1109/MMSP.2012.6343426","DOIUrl":"https://doi.org/10.1109/MMSP.2012.6343426","url":null,"abstract":"In this paper we study the effect of incorporation of automatic transcriptions in the speaker diarization process. We aim to improve both the diarization accuracy as evaluated by standard objective measures and quality of the diarization output from user's perspective. Although the presented approach relies on output of an automatic speech recognizer, it makes no use of lexical information. Instead, we use information about word boundaries and classification of non-speech events occurring in the processed stream. The former information is used as constraining condition for speaker change-point candidates and the latter facilitate to neglect various vocal noise sounds that carry no speaker-specific information (considering representation of the signal by cepstral features) and thus harm the speaker's representation. The experimental evaluation of the presented approach was carried out using the COST278 multilingual broadcast news database. We demonstrate that the approach yields improvement in terms of both speaker diarization and segmentation performance measures. Furthermore, we show that the number of change-points detected within words (and not at their boundaries) is significantly reduced.","PeriodicalId":325274,"journal":{"name":"2012 IEEE 14th International Workshop on Multimedia Signal Processing (MMSP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132141863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anas Al-Nuaimi, Burak Cizmeci, F. Schweiger, Roman Katz, S. Taifour, E. Steinbach, M. Fahrmair
{"title":"ConCor+: Robust and confident video synchronization using consensus-based Cross-correlation","authors":"Anas Al-Nuaimi, Burak Cizmeci, F. Schweiger, Roman Katz, S. Taifour, E. Steinbach, M. Fahrmair","doi":"10.1109/MMSP.2012.6343420","DOIUrl":"https://doi.org/10.1109/MMSP.2012.6343420","url":null,"abstract":"Consensus-based Cross-correlation (ConCor) is a recently presented algorithm for robust synchronization of noisy and corrupted signals. ConCor has a number of interdependent parameters that need to be set correctly to guarantee good performance. In this paper we analyse the effects of the individual parameters on ConCor's behaviour and performance. As a second contribution, we show that a parameter sweep with subsequent majority voting can be used to boost ConCor's performance and produce a trustworthy confidence measure. As a final contribution we show how the proposed extension also allows performing multi-modal (joint audio-video) synchronization of casual multi-perspective video recordings enabling superior matching performance.","PeriodicalId":325274,"journal":{"name":"2012 IEEE 14th International Workshop on Multimedia Signal Processing (MMSP)","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131735061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low bitrate coding schemes for local image descriptors","authors":"A. Redondi, M. Cesana, M. Tagliasacchi","doi":"10.1109/MMSP.2012.6343427","DOIUrl":"https://doi.org/10.1109/MMSP.2012.6343427","url":null,"abstract":"Efficient coding of local image descriptors is of paramount importance when they need to be transmitted to a remote destination on bandwidth constrained networks. This is a case that arises, e.g., in mobile visual search and visual wireless sensor networks. In this work we consider SURF, a popular descriptor suitable for low-complexity devices, and we provide a comparative study of lossy coding schemes operating at low bitrate (e.g., less than 128 bits / descriptor). Our investigation covers schemes that address both intra- and inter-descriptor redundancy, including methods that have not been tested before in this context, e.g., sparse coding, lifting-based coding on trees, and hybrid intra and inter-descriptor coding. The experimental evaluation is carried out on two publicly available datasets, in terms of both rate-distortion and rate-accuracy, for the specific task of object recognition. Our results show that a rate saving of 15-30% can be achieved by exploiting intra-descriptor redundancy. On the other side, addressing inter-descriptor redundancy does not lead to substantial gains when applied alone, whereas it leads to marginal gains (up to 3%) when used in hybrid schemes jointly with intra-descriptor coding.","PeriodicalId":325274,"journal":{"name":"2012 IEEE 14th International Workshop on Multimedia Signal Processing (MMSP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131668947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extracting social values and group identities from social media text data","authors":"David A. Broniatowski","doi":"10.1109/MMSP.2012.6343446","DOIUrl":"https://doi.org/10.1109/MMSP.2012.6343446","url":null,"abstract":"This paper presents preliminary results on the extraction of group identities from social media data using topic models and a rich form of sentiment analysis that is designed to correspond to psychologically-validated emotional states. Our approach is based upon the sociological notion that group identity forms the basis for behavioral change [1]. We begin by inferring social values from social media text data by combining information regarding topic content and sentiment. Next, groups are inferred as a latent variable mediating between individual social media authors and social values. A topic model is proposed, extending the Ailment Topic Aspect Model (ATAM) used by Paul and Dredze [2], and applied to a large set of blog data extracted from the Media Cloud [3] daily updates. We also provide a qualitative and quantitative analysis of model outputs.","PeriodicalId":325274,"journal":{"name":"2012 IEEE 14th International Workshop on Multimedia Signal Processing (MMSP)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130578137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Blob detection and filtering for character segmentation of license plates","authors":"Youngwoo Yoon, Kyu-Dae Ban, H. Yoon, Jaehong Kim","doi":"10.1109/MMSP.2012.6343467","DOIUrl":"https://doi.org/10.1109/MMSP.2012.6343467","url":null,"abstract":"This paper presents a character segmentation method to address automatic number plate recognition problem. The method considered pixel intensity, character appearance, and arrangement of characters altogether to segment character regions. The method firstly discovers candidate blobs of characters by using connected component analysis and appearance-based character detection. A character recognizer is used for removing redundant and noisy blobs. Then, a trained classifier selects character blobs among the candidates by examining arrangement of the blobs. Experimental results show an achievement of 98.3% of segmentation rate, which prove the effectiveness of our method.","PeriodicalId":325274,"journal":{"name":"2012 IEEE 14th International Workshop on Multimedia Signal Processing (MMSP)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121309615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}