2007 IEEE 9th Workshop on Multimedia Signal Processing最新文献

筛选
英文 中文
A review of the acoustic and linguistic properties of children's speech 儿童言语的声学和语言特性综述
2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-10-01 DOI: 10.1109/MMSP.2007.4412809
A. Potamianos, Shrikanth S. Narayanan
{"title":"A review of the acoustic and linguistic properties of children's speech","authors":"A. Potamianos, Shrikanth S. Narayanan","doi":"10.1109/MMSP.2007.4412809","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412809","url":null,"abstract":"In this paper, we review the acoustic and linguistic properties of children's speech for both read and spontaneous speech. First, the effect of developmental changes on the absolute values and variability of acoustic correlates is presented for read speech for children ages 6 and up. Then, verbal child-machine spontaneous interaction is reviewed and results from recent studies are presented. Age trends of acoustic, linguistic and interaction parameters are discussed, such as sentence duration, filled pauses, politeness and frustration markers, and modality usage. Some differences between child-machine and human-human interaction are pointed out. The implications for acoustic modeling, linguistic modeling and spoken dialogue systems design for children are discussed.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127717022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
A Component Estimation Framework for Information Forensics 面向信息取证的组件估计框架
2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-10-01 DOI: 10.1109/MMSP.2007.4412900
A. Swaminathan, Min Wu, K. J. R. Liu
{"title":"A Component Estimation Framework for Information Forensics","authors":"A. Swaminathan, Min Wu, K. J. R. Liu","doi":"10.1109/MMSP.2007.4412900","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412900","url":null,"abstract":"With a rapid growth of imaging technologies and an increasingly widespread usage of digital images and videos for a large number of high security and forensic applications, there is a strong need for techniques to verify the source and integrity of digital data. Component forensics is new approach for forensic analysis that aims to estimate the algorithms and parameters in each component of the digital device. In this paper, we develop a novel theoretical foundation to understand the fundamental performance limits of component forensics. We define formal notions of identifiability of components in the information processing chain, and present methods to quantify the accuracies at which the component parameters can be estimated. Building upon the proposed theoretical framework, we devise methods to improve the accuracies of component parameter estimation for a wide range of forensic applications.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126309279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Statistical Modeling and Retrieval of Polyphonic Music 复调音乐的统计建模与检索
2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-10-01 DOI: 10.1109/MMSP.2007.4412902
E. Ünal, P. Georgiou, Shrikanth S. Narayanan, E. Chew
{"title":"Statistical Modeling and Retrieval of Polyphonic Music","authors":"E. Ünal, P. Georgiou, Shrikanth S. Narayanan, E. Chew","doi":"10.1109/MMSP.2007.4412902","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412902","url":null,"abstract":"In this article, we propose a solution to the problem of query by example for polyphonic music audio. We first present a generic mid-level representation for audio queries. Unlike previous efforts in the literature, the proposed representation is not dependent on the different spectral characteristics of different musical instruments and the accurate location of note onsets and offsets. This is achieved by first mapping the short term frequency spectrum of consecutive audio frames to the musical space (the spiral array) and defining a tonal identity with respect to center of effect that is generated by the spectral weights of the musical notes. We then use the resulting single dimensional text representations of the audio to create a-gram statistical sequence models to track the tonal characteristics and the behavior of the pieces. After performing appropriate smoothing, we build a collection of melodic n-gram models for testing. Using perplexity-based scoring, we test the likelihood of a sequence of lexical chords (an audio query) given each model in the database collection. Initial results show that, some variations of the input piece appears in the top 5 results 81% of the time for whole melody inputs within a 500 polyphonic melody database. We also tested the retrieval engine for small audio clips. Using 25s segments, variations of the input piece are among the top 5 results 75% of the time.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126472052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
An adaptive synthesis filter bank for image decoding with fractional scalability 具有分数可扩展性的图像解码自适应合成滤波器组
2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-10-01 DOI: 10.1109/MMSP.2007.4412878
N. Tizon, B. Pesquet-Popescu
{"title":"An adaptive synthesis filter bank for image decoding with fractional scalability","authors":"N. Tizon, B. Pesquet-Popescu","doi":"10.1109/MMSP.2007.4412878","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412878","url":null,"abstract":"Transform image coding and more particularly the subclass of block based transformations are widely used to compress images. The JPEG standard for still images and MPEG codec specifications for video are very efficient implementations, but these algorithms perform image reconstruction without taking into account the quantization operations performed on the transform coefficients. In this paper, we propose an adaptive algorithm to tune the inverse transformation matrix as a function of the quantization level in order to minimize the reconstruction error. The developed algorithm provides quality scalability features and also integrates resizing operations into the inverse transformation process leading to a spatial scalability of fractional factors.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122734606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Facial Features Tracking for Gross Head Movement analysis and Expression Recognition 面部特征跟踪用于头部运动分析和表情识别
2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-10-01 DOI: 10.1109/MMSP.2007.4412803
Dimitris N. Metaxas
{"title":"Facial Features Tracking for Gross Head Movement analysis and Expression Recognition","authors":"Dimitris N. Metaxas","doi":"10.1109/MMSP.2007.4412803","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412803","url":null,"abstract":"Summary form only given. The tracking and recognition of facial expressions from a single cameras is an important and challenging problem. We present a real-time framework for Action Units(AU)/Expression recognition based on facial features tracking and Adaboost. Accurate facial feature tracking is challenging due to changes in illumination, skin color variations, possible large head rotations, partial occlusions and fast head movements. We use models based on Active Shapes to localize facial features on the face in a generic pose. Shapes of facial features undergo non-linear transformation as the head rotates from frontal view to profile view. We learn the non-linear shape manifold as multiple-overlapping subspaces with different subspaces representing different head poses. The face alignment is done by searching over the non-linear shape manifold and aligning the landmark points to the features' boundaries. The recognized features are tracked across multiple frames using KLT Tracker by constraining the shape to lie on the non-linear manifold. Our tracking framework has been successfully used for detecting both gross head movements, like nodding, shaking and head pose prediction. Further, we use the tracked features to accurately extract bounded faces in a video sequence and use it for recognizing facial expressions. Our approach is based on coded dynamical features. In order to capture the dynamic characteristics of facial events, we design the dynamic haar-like features to represent the temporal variations of facial events. Inspired by the binary pattern coding, we further encode the dynamic haar-like features into binary pattern features, which are useful to construct weak classifiers for boosting learning. Finally Adaboost is used to learn a set of discriminating coded dynamic features for facial active units and expression recognition. We have achieved approximately 97% detection rate for gross head movements like shaking and nodding. The recognition rates for facial expressions averages to -95% for the most important action units.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"71 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131451220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Soft-Decision Color Demosaicking with Direction Vector Selection 软判决颜色去马赛克与方向矢量选择
2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-10-01 DOI: 10.1109/MMSP.2007.4412913
Carman K. M. Yuk, O. Au, Richard Y. M. Li, Sui-Yuk Lam
{"title":"Soft-Decision Color Demosaicking with Direction Vector Selection","authors":"Carman K. M. Yuk, O. Au, Richard Y. M. Li, Sui-Yuk Lam","doi":"10.1109/MMSP.2007.4412913","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412913","url":null,"abstract":"We propose a soft-decision color demosaicking algorithm with direction vector selection which effectively minimizing color artifacts. Since our interpolation uses soft decision and decision making bases on direction vectors which consists of three primary colors together with same direction, it not only maintains the direction consistency, but also significantly reduces color artifacts by largely avoiding interpolation across the edge. Experimental results show that our proposed algorithm outperforms state-of-art methods and the visual quality of reconstructed images is also obviously improved.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"163 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132062044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Recent advances in brain-computer interfaces 脑机接口的最新进展
2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-10-01 DOI: 10.1109/MMSP.2007.4412807
T. Ebrahimi
{"title":"Recent advances in brain-computer interfaces","authors":"T. Ebrahimi","doi":"10.1109/MMSP.2007.4412807","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412807","url":null,"abstract":"A brain-computer interface (BCI) is a communication system that translates brain activity into commands for a computer or other devices. In other words, a BCI allows users to act on their environment by using only brain activity, without using peripheral nerves and muscles. The major goal of BCI research is to develop systems that allow disabled users to communicate with other persons, to control artificial limbs, or to control their environment. To achieve this goal, many aspects of BCI systems are currently being investigated. Research areas include evaluation of invasive and noninvasive technologies to measure brain activity, evaluation of control signals (i.e. patterns of brain activity that can be used for communication), development of algorithms for translation of brain signals into computer commands, and the development of new BCI applications. In this paper we give an overview of the aspects of BCI research mentioned above and highlight recent developments and open problems.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130992316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 70
Long-term Trajectory Extraction for Moving Vehicles 移动车辆的长期轨迹提取
2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-10-01 DOI: 10.1109/MMSP.2007.4412858
Jie Xu, G. Ye, Jian Zhang
{"title":"Long-term Trajectory Extraction for Moving Vehicles","authors":"Jie Xu, G. Ye, Jian Zhang","doi":"10.1109/MMSP.2007.4412858","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412858","url":null,"abstract":"In recent years, trajectory analysis of moving vehicles in video-based traffic monitoring systems has drawn the attention of many researchers. Trajectory extraction is a fundamental step that is required prior to trajectory analysis. Lots of previous work have focused on trajectory extraction via tracking. However, they often fail to achieve long-term consistent trajectories. In this paper, we propose a robust approach for extracting long-term trajectories of moving vehicles in traffic monitoring using SIFT-descriptor. Experimental results show that the proposed method outperforms tracking-based techniques.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128139294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Robust Digital Watermarking for Wavelet-based Compression 基于小波压缩的鲁棒数字水印
2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-10-01 DOI: 10.1109/MMSP.2007.4412895
Syed Ali Raza Jafri, Shahab Baqai
{"title":"Robust Digital Watermarking for Wavelet-based Compression","authors":"Syed Ali Raza Jafri, Shahab Baqai","doi":"10.1109/MMSP.2007.4412895","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412895","url":null,"abstract":"A digital watermark is an undetectable mark placed on a host media. There are various applications for digital watermarks which include authentication, finger printing and digital rights enforcement. This implies that the watermark should be tolerant to image processing and lossy compression type of operations. Most standard watermarking techniques do not survive wavelet based compression and may also not be compatible with the scalability feature of wavelet based compression. We present a novel digital watermarking scheme which can successfully withstand wavelet based compression, as well as standard watermark attacks. Our technique is designed to work along side the SNR scalable transmission feature provided with most wavelet compression suites so that the watermark may be authenticated at any level of SNR transmission. Experimental results show that our proposed watermarking method performs better than existing techniques when the host data is compressed using wavelet transforms.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124255564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
R-Flow: An Extensible XML Based Multimodal Dialog System Architecture R-Flow:一个可扩展的基于XML的多模态对话系统架构
2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-10-01 DOI: 10.1109/MMSP.2007.4412824
Li Li, Quanzhi Li, W. Chou, Feng Liu
{"title":"R-Flow: An Extensible XML Based Multimodal Dialog System Architecture","authors":"Li Li, Quanzhi Li, W. Chou, Feng Liu","doi":"10.1109/MMSP.2007.4412824","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412824","url":null,"abstract":"This paper presents an approach for an extensible multimodal interaction dialogue system, R-Flow, based on a recursive application of Model-View-C'ontroller (MVC) design patterns to derive system components and interfaces. This approach leads to a clear separation of three self-contained functional layers in a multimodal dialogue system: modality independent dialog control, synchronization of logical modalities, and physical presentation. These layers are codified and weaved together through standard based XML languages. In particular, it utilizes the standard State-Chart XML (SCXML) for dialog control, SMIL and EMMA based XM-Flow for modality synchronization and interpretation, and a generic XML based binding mechanism to map logical modalities to physical presentations. A prototype system has been implemented for multimodal (e.g. speech, text, and mouse) manipulation of Google map. Our experimental results indicated that such layered and component-based XML MMI system is feasible and the performance of such MMI system is studied and measured.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"62 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116601639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信