2007 IEEE 9th Workshop on Multimedia Signal Processing最新文献_第7页

A review of the acoustic and linguistic properties of children's speech 儿童言语的声学和语言特性综述

2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-10-01 DOI: 10.1109/MMSP.2007.4412809

A. Potamianos, Shrikanth S. Narayanan

引用次数: 33

A Component Estimation Framework for Information Forensics 面向信息取证的组件估计框架

2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-10-01 DOI: 10.1109/MMSP.2007.4412900

A. Swaminathan, Min Wu, K. J. R. Liu

引用次数: 12

Statistical Modeling and Retrieval of Polyphonic Music 复调音乐的统计建模与检索

2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-10-01 DOI: 10.1109/MMSP.2007.4412902

E. Ünal, P. Georgiou, Shrikanth S. Narayanan, E. Chew

{"title":"Statistical Modeling and Retrieval of Polyphonic Music","authors":"E. Ünal, P. Georgiou, Shrikanth S. Narayanan, E. Chew","doi":"10.1109/MMSP.2007.4412902","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412902","url":null,"abstract":"In this article, we propose a solution to the problem of query by example for polyphonic music audio. We first present a generic mid-level representation for audio queries. Unlike previous efforts in the literature, the proposed representation is not dependent on the different spectral characteristics of different musical instruments and the accurate location of note onsets and offsets. This is achieved by first mapping the short term frequency spectrum of consecutive audio frames to the musical space (the spiral array) and defining a tonal identity with respect to center of effect that is generated by the spectral weights of the musical notes. We then use the resulting single dimensional text representations of the audio to create a-gram statistical sequence models to track the tonal characteristics and the behavior of the pieces. After performing appropriate smoothing, we build a collection of melodic n-gram models for testing. Using perplexity-based scoring, we test the likelihood of a sequence of lexical chords (an audio query) given each model in the database collection. Initial results show that, some variations of the input piece appears in the top 5 results 81% of the time for whole melody inputs within a 500 polyphonic melody database. We also tested the retrieval engine for small audio clips. Using 25s segments, variations of the input piece are among the top 5 results 75% of the time.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126472052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

An adaptive synthesis filter bank for image decoding with fractional scalability 具有分数可扩展性的图像解码自适应合成滤波器组

2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-10-01 DOI: 10.1109/MMSP.2007.4412878

N. Tizon, B. Pesquet-Popescu

引用次数: 2

Facial Features Tracking for Gross Head Movement analysis and Expression Recognition 面部特征跟踪用于头部运动分析和表情识别

2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-10-01 DOI: 10.1109/MMSP.2007.4412803

Dimitris N. Metaxas

{"title":"Facial Features Tracking for Gross Head Movement analysis and Expression Recognition","authors":"Dimitris N. Metaxas","doi":"10.1109/MMSP.2007.4412803","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412803","url":null,"abstract":"Summary form only given. The tracking and recognition of facial expressions from a single cameras is an important and challenging problem. We present a real-time framework for Action Units(AU)/Expression recognition based on facial features tracking and Adaboost. Accurate facial feature tracking is challenging due to changes in illumination, skin color variations, possible large head rotations, partial occlusions and fast head movements. We use models based on Active Shapes to localize facial features on the face in a generic pose. Shapes of facial features undergo non-linear transformation as the head rotates from frontal view to profile view. We learn the non-linear shape manifold as multiple-overlapping subspaces with different subspaces representing different head poses. The face alignment is done by searching over the non-linear shape manifold and aligning the landmark points to the features' boundaries. The recognized features are tracked across multiple frames using KLT Tracker by constraining the shape to lie on the non-linear manifold. Our tracking framework has been successfully used for detecting both gross head movements, like nodding, shaking and head pose prediction. Further, we use the tracked features to accurately extract bounded faces in a video sequence and use it for recognizing facial expressions. Our approach is based on coded dynamical features. In order to capture the dynamic characteristics of facial events, we design the dynamic haar-like features to represent the temporal variations of facial events. Inspired by the binary pattern coding, we further encode the dynamic haar-like features into binary pattern features, which are useful to construct weak classifiers for boosting learning. Finally Adaboost is used to learn a set of discriminating coded dynamic features for facial active units and expression recognition. We have achieved approximately 97% detection rate for gross head movements like shaking and nodding. The recognition rates for facial expressions averages to -95% for the most important action units.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"71 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131451220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Soft-Decision Color Demosaicking with Direction Vector Selection 软判决颜色去马赛克与方向矢量选择

2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-10-01 DOI: 10.1109/MMSP.2007.4412913

Carman K. M. Yuk, O. Au, Richard Y. M. Li, Sui-Yuk Lam

引用次数: 4

Recent advances in brain-computer interfaces 脑机接口的最新进展

2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-10-01 DOI: 10.1109/MMSP.2007.4412807

T. Ebrahimi

引用次数: 70

Long-term Trajectory Extraction for Moving Vehicles 移动车辆的长期轨迹提取

2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-10-01 DOI: 10.1109/MMSP.2007.4412858

Jie Xu, G. Ye, Jian Zhang

引用次数: 3

Robust Digital Watermarking for Wavelet-based Compression 基于小波压缩的鲁棒数字水印

2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-10-01 DOI: 10.1109/MMSP.2007.4412895

Syed Ali Raza Jafri, Shahab Baqai

引用次数: 4

R-Flow: An Extensible XML Based Multimodal Dialog System Architecture R-Flow:一个可扩展的基于XML的多模态对话系统架构

2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-10-01 DOI: 10.1109/MMSP.2007.4412824

Li Li, Quanzhi Li, W. Chou, Feng Liu

引用次数: 7