{"title":"A weighted approach of missing data technique in cepstra domain based on S-function","authors":"Pei Yi, Yubo Ge","doi":"10.1109/MMSP.2010.5661987","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5661987","url":null,"abstract":"The application of Missing Data Technique (MDT) has shown to improve the performance of speech recognition. To apply MDT to cepstral domain, this paper presents a weighted approach to compute the reliability of cepstral feature based on sigmoid function and introduces a weighted distance algorithm. It is deduced that the reliability compensates the Gaussian variance in hidden Markov model (HMM) frame by frame to reduce the mismatch between clean-trained model and corrupted speech. Experimental evaluation using the Aurora2 database demonstrates a distinct digit error rate reduction. The main advantages of the approach are simple system implementation, low computation cost and easy to plug into other robust recognition algorithm.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129984735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Canadas-Quesada, F. J. Rodríguez-Serrano, P. Vera-Candeas, N. Ruiz-Reyes, J. Carabias-Orti
{"title":"Improving multiple-F0 estimation by onset detection for polyphonic music transcription","authors":"F. Canadas-Quesada, F. J. Rodríguez-Serrano, P. Vera-Candeas, N. Ruiz-Reyes, J. Carabias-Orti","doi":"10.1109/MMSP.2010.5661985","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5661985","url":null,"abstract":"In a monaural polyphonic context, music transcription and specifically, multiple-F0 estimation systems have achieved promising results in the last decade. However, most of these systems present intermittent misses of pitch within a note or inaccurate definitions about onsets and offsets due to frame-by-frame analysis. In this paper, we propose a multiple-F0 estimation system which extracts a set of active pitches at each frame (analysis frame) but note tracking is performed defining temporal intervals by an accurate onset detector. Our system shows promising results, in terms of onset and multiple-F0 estimation, to be evaluated using real-world and synthesized polyphonic music recordings taken from MAPS music database.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132606504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Object tracking under illumination variations using 2D-cepstrum characteristics of the target","authors":"Fuat Çogun, A. Cetin","doi":"10.1109/MMSP.2010.5662076","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662076","url":null,"abstract":"Most video processing applications require object tracking as it is the base operation for real-time implementations such as surveillance, monitoring and video compression. Therefore, accurate tracking of an object under varying scene conditions is crucial for robustness. It is well known that illumination variations on the observed scene and target are an obstacle against robust object tracking causing the tracker lose the target. In this paper, a 2D-cepstrum based approach is proposed to overcome this problem. Cepstral domain features extracted from the target region are introduced into the covari-ance tracking algorithm and it is experimentally observed that 2D-cepstrum analysis of the target object provides robustness to varying illumination conditions. Another contribution of the paper is the development of the co-difference matrix based object tracking instead of the recently introduced covariance matrix based method.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133184666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"H.264-based multiple description coding using motion compensated temporal interpolation","authors":"C. Greco, Marco Cagnazzo, B. Pesquet-Popescu","doi":"10.1109/MMSP.2010.5662026","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662026","url":null,"abstract":"Multiple description coding is a framework adapted to noisy transmission environments. In this work, we use H.264 to create two descriptions of a video sequence, each of them assuring a minimum quality level. If both of them are received, a suitable algorithm is used to produce an improved quality sequence. The key technique is a temporal image interpolation using motion compensation, inspired to the distributed video coding context. The interpolated image blocks are weighted with the received blocks obtained from the other description. The optimal weights are computed at the encoder and efficiently sent to the decoder as side information. The proposed technique shows a remarkable gain for central decoding with respect to similar methods available in the state of the art.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123224669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Human emotion recognition using real 3D visual features from Gabor library","authors":"Tie Yun, L. Guan","doi":"10.1109/MMSP.2010.5662073","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662073","url":null,"abstract":"Emotional state recognition is an important component for efficient human-computer interaction. Most existing works address this problem using 2D features, but they are sensitive to head pose, clutter, and variations in lighting conditions. The general 3D based methods only consider geometric information for feature extraction. In this paper, we present a real 3D visual features based method for human emotion recognition. 3D geometric information plus colour/density information of the facial expressions are extracted by 3D Gabor library to construct visual feature vectors. The filter's scale, orientation, and shape of the library are specified according to the appearance patterns of the 3D facial expressions. An improved kernel canonical correlation analysis (IKCCA) algorithm is proposed for final decision. From training samples, the semantic ratings that describe the different facial expressions are computed by IKCCA to generate a seven dimensional semantic expression vector. It is applied for learning the correlation with different testing samples. According to this correlation, we estimate the associated expression vector and perform expression classification. From experiment results, our proposed method demonstrates impressive performance.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122068802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Angélique Dremeau, Mehmet Türkan, C. Herzet, C. Guillemot, J. Fuchs
{"title":"Spatial intra-prediction based on mixtures of sparse representations","authors":"Angélique Dremeau, Mehmet Türkan, C. Herzet, C. Guillemot, J. Fuchs","doi":"10.1109/MMSP.2010.5662044","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662044","url":null,"abstract":"In this paper, we consider the problem of spatial prediction based on sparse representations. Several algorithms dealing with this problem can be found in the literature. We propose a novel method involving a mixture of sparse representations. We first place this approach into a probabilistic framework and then derive a practical procedure to solve it. Comparisons of the rate-distortion performance show the superiority of the proposed algorithm with regard to other state-of-the-art algorithms.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130742787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Petrazzuoli, Thomas Maugey, Marco Cagnazzo, B. Pesquet-Popescu
{"title":"Side information refinement for long duration GOPs in DVC","authors":"G. Petrazzuoli, Thomas Maugey, Marco Cagnazzo, B. Pesquet-Popescu","doi":"10.1109/MMSP.2010.5662038","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662038","url":null,"abstract":"Side information generation is a critical step in distributed video coding systems. This is performed by using motion compensated temporal interpolation between two or more key frames (KFs). However, when the temporal distance between key frames increases (i.e. when the GOP size becomes large), the linear interpolation becomes less effective. In a previous work we showed that this problem can be mitigated by using high order interpolation. Now, in the case of long duration GOP, state-of-the-art algorithms propose a hierarchical algorithm for side information generation. By using this procedure, the quality of the central interpolated image in a GOP is consistently worse than images closer to the KFs. In this paper we propose a refinement of the central WZFs by higher order interpolation of the already decoded WZFs, that are closer to the WZF to be estimated. So we reduce the fluctuation of side information quality, with a beneficial impact on final rate-distortion characteristics of the system. The experimental results show an improvement on the SI up to 2.71 dB with respect the state-of-the-art and a global improvement of the PSNR on the decoded frames up to 0.71 dB and a bit rate reduction up to 15 %.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123137701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fitting pinna-related transfer functions to anthropometry for binaural sound rendering","authors":"Simone Spagnol, M. Geronazzo, F. Avanzini","doi":"10.1109/MMSP.2010.5662018","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662018","url":null,"abstract":"This paper faces the general problem of modeling pinna-related transfer functions (PRTFs) for 3-D sound rendering. Following a structural approach, we aim at constructing a model for PRTF synthesis which allows to control separately the evolution of ear resonances and spectral notches through the design of two distinct filter blocks. Taking such model as endpoint, we propose a method based on the McAulay-Quatieri partial tracking algorithm to extract the frequencies of the most important spectral notches. Ray-tracing analysis performed on the so obtained tracks reveals a convincing correspondence between extracted frequencies and pinna geometry of a bunch of subjects.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126326549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint source-channel coding/decoding of 3D-ESCOT bitstreams","authors":"M. Abid, M. Kieffer, B. Pesquet-Popescu","doi":"10.1109/MMSP.2010.5662034","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662034","url":null,"abstract":"Joint source-channel decoding (JSCD) exploits residual redundancy in compressed bitstreams to improve the robustness to transmission errors of multimedia coding schemes. This paper proposes an architecture to introduce some additional side information in compressed streams to help JSCD. This architecture exploits a reference decoder already present or introduced at the encoder side. An application to the robust decoding of 3D-ESCOT encoded bitstreams generated within the Vidwav video coder is presented. The layered bitstream generated by this encoder allows SNR scalability, and moreover, when processed by a JSCD, provides increased robustness to transmission errors compared with a single layered bitstream.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122200392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thi Minh Nguyet Hoang, S. Ragot, Balázs Kövesi, P. Scalart
{"title":"Parametric stereo extension of ITU-T G.722 based on a new downmixing scheme","authors":"Thi Minh Nguyet Hoang, S. Ragot, Balázs Kövesi, P. Scalart","doi":"10.1109/MMSP.2010.5662017","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662017","url":null,"abstract":"In this paper, we present a novel, frequency-domain stereo to mono downmixing, which preserves the energy of spectral components and avoids setting the left or right channel as a phase reference. Based on this downmixing technique, a parametric stereo analysis-synthesis model is described in which subband stereo parameters consist of interchannel level differences and phase differences between the mono signal and one of the stereo channels (left or right). This model is applied to the stereo extension of ITU-T G.722 at 56+8 and 64+16 kbit/s with a frame length of 5 ms. AB test results are provided to assess the quality of the proposed downmixing technique. In addition, the quality of the proposed G.722-based stereo coder is compared against reference coders (G.722.1 at 24 and 32 kbit/s dual mono and G.722 at 64 kbit/s dual mono) for clean speech, noisy speech and music.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120909957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}