{"title":"A review of the acoustic and linguistic properties of children's speech","authors":"A. Potamianos, Shrikanth S. Narayanan","doi":"10.1109/MMSP.2007.4412809","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412809","url":null,"abstract":"In this paper, we review the acoustic and linguistic properties of children's speech for both read and spontaneous speech. First, the effect of developmental changes on the absolute values and variability of acoustic correlates is presented for read speech for children ages 6 and up. Then, verbal child-machine spontaneous interaction is reviewed and results from recent studies are presented. Age trends of acoustic, linguistic and interaction parameters are discussed, such as sentence duration, filled pauses, politeness and frustration markers, and modality usage. Some differences between child-machine and human-human interaction are pointed out. The implications for acoustic modeling, linguistic modeling and spoken dialogue systems design for children are discussed.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127717022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Component Estimation Framework for Information Forensics","authors":"A. Swaminathan, Min Wu, K. J. R. Liu","doi":"10.1109/MMSP.2007.4412900","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412900","url":null,"abstract":"With a rapid growth of imaging technologies and an increasingly widespread usage of digital images and videos for a large number of high security and forensic applications, there is a strong need for techniques to verify the source and integrity of digital data. Component forensics is new approach for forensic analysis that aims to estimate the algorithms and parameters in each component of the digital device. In this paper, we develop a novel theoretical foundation to understand the fundamental performance limits of component forensics. We define formal notions of identifiability of components in the information processing chain, and present methods to quantify the accuracies at which the component parameters can be estimated. Building upon the proposed theoretical framework, we devise methods to improve the accuracies of component parameter estimation for a wide range of forensic applications.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126309279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Ünal, P. Georgiou, Shrikanth S. Narayanan, E. Chew
{"title":"Statistical Modeling and Retrieval of Polyphonic Music","authors":"E. Ünal, P. Georgiou, Shrikanth S. Narayanan, E. Chew","doi":"10.1109/MMSP.2007.4412902","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412902","url":null,"abstract":"In this article, we propose a solution to the problem of query by example for polyphonic music audio. We first present a generic mid-level representation for audio queries. Unlike previous efforts in the literature, the proposed representation is not dependent on the different spectral characteristics of different musical instruments and the accurate location of note onsets and offsets. This is achieved by first mapping the short term frequency spectrum of consecutive audio frames to the musical space (the spiral array) and defining a tonal identity with respect to center of effect that is generated by the spectral weights of the musical notes. We then use the resulting single dimensional text representations of the audio to create a-gram statistical sequence models to track the tonal characteristics and the behavior of the pieces. After performing appropriate smoothing, we build a collection of melodic n-gram models for testing. Using perplexity-based scoring, we test the likelihood of a sequence of lexical chords (an audio query) given each model in the database collection. Initial results show that, some variations of the input piece appears in the top 5 results 81% of the time for whole melody inputs within a 500 polyphonic melody database. We also tested the retrieval engine for small audio clips. Using 25s segments, variations of the input piece are among the top 5 results 75% of the time.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126472052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An adaptive synthesis filter bank for image decoding with fractional scalability","authors":"N. Tizon, B. Pesquet-Popescu","doi":"10.1109/MMSP.2007.4412878","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412878","url":null,"abstract":"Transform image coding and more particularly the subclass of block based transformations are widely used to compress images. The JPEG standard for still images and MPEG codec specifications for video are very efficient implementations, but these algorithms perform image reconstruction without taking into account the quantization operations performed on the transform coefficients. In this paper, we propose an adaptive algorithm to tune the inverse transformation matrix as a function of the quantization level in order to minimize the reconstruction error. The developed algorithm provides quality scalability features and also integrates resizing operations into the inverse transformation process leading to a spatial scalability of fractional factors.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122734606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Facial Features Tracking for Gross Head Movement analysis and Expression Recognition","authors":"Dimitris N. Metaxas","doi":"10.1109/MMSP.2007.4412803","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412803","url":null,"abstract":"Summary form only given. The tracking and recognition of facial expressions from a single cameras is an important and challenging problem. We present a real-time framework for Action Units(AU)/Expression recognition based on facial features tracking and Adaboost. Accurate facial feature tracking is challenging due to changes in illumination, skin color variations, possible large head rotations, partial occlusions and fast head movements. We use models based on Active Shapes to localize facial features on the face in a generic pose. Shapes of facial features undergo non-linear transformation as the head rotates from frontal view to profile view. We learn the non-linear shape manifold as multiple-overlapping subspaces with different subspaces representing different head poses. The face alignment is done by searching over the non-linear shape manifold and aligning the landmark points to the features' boundaries. The recognized features are tracked across multiple frames using KLT Tracker by constraining the shape to lie on the non-linear manifold. Our tracking framework has been successfully used for detecting both gross head movements, like nodding, shaking and head pose prediction. Further, we use the tracked features to accurately extract bounded faces in a video sequence and use it for recognizing facial expressions. Our approach is based on coded dynamical features. In order to capture the dynamic characteristics of facial events, we design the dynamic haar-like features to represent the temporal variations of facial events. Inspired by the binary pattern coding, we further encode the dynamic haar-like features into binary pattern features, which are useful to construct weak classifiers for boosting learning. Finally Adaboost is used to learn a set of discriminating coded dynamic features for facial active units and expression recognition. We have achieved approximately 97% detection rate for gross head movements like shaking and nodding. The recognition rates for facial expressions averages to -95% for the most important action units.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"71 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131451220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carman K. M. Yuk, O. Au, Richard Y. M. Li, Sui-Yuk Lam
{"title":"Soft-Decision Color Demosaicking with Direction Vector Selection","authors":"Carman K. M. Yuk, O. Au, Richard Y. M. Li, Sui-Yuk Lam","doi":"10.1109/MMSP.2007.4412913","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412913","url":null,"abstract":"We propose a soft-decision color demosaicking algorithm with direction vector selection which effectively minimizing color artifacts. Since our interpolation uses soft decision and decision making bases on direction vectors which consists of three primary colors together with same direction, it not only maintains the direction consistency, but also significantly reduces color artifacts by largely avoiding interpolation across the edge. Experimental results show that our proposed algorithm outperforms state-of-art methods and the visual quality of reconstructed images is also obviously improved.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"163 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132062044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recent advances in brain-computer interfaces","authors":"T. Ebrahimi","doi":"10.1109/MMSP.2007.4412807","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412807","url":null,"abstract":"A brain-computer interface (BCI) is a communication system that translates brain activity into commands for a computer or other devices. In other words, a BCI allows users to act on their environment by using only brain activity, without using peripheral nerves and muscles. The major goal of BCI research is to develop systems that allow disabled users to communicate with other persons, to control artificial limbs, or to control their environment. To achieve this goal, many aspects of BCI systems are currently being investigated. Research areas include evaluation of invasive and noninvasive technologies to measure brain activity, evaluation of control signals (i.e. patterns of brain activity that can be used for communication), development of algorithms for translation of brain signals into computer commands, and the development of new BCI applications. In this paper we give an overview of the aspects of BCI research mentioned above and highlight recent developments and open problems.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130992316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Long-term Trajectory Extraction for Moving Vehicles","authors":"Jie Xu, G. Ye, Jian Zhang","doi":"10.1109/MMSP.2007.4412858","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412858","url":null,"abstract":"In recent years, trajectory analysis of moving vehicles in video-based traffic monitoring systems has drawn the attention of many researchers. Trajectory extraction is a fundamental step that is required prior to trajectory analysis. Lots of previous work have focused on trajectory extraction via tracking. However, they often fail to achieve long-term consistent trajectories. In this paper, we propose a robust approach for extracting long-term trajectories of moving vehicles in traffic monitoring using SIFT-descriptor. Experimental results show that the proposed method outperforms tracking-based techniques.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128139294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Digital Watermarking for Wavelet-based Compression","authors":"Syed Ali Raza Jafri, Shahab Baqai","doi":"10.1109/MMSP.2007.4412895","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412895","url":null,"abstract":"A digital watermark is an undetectable mark placed on a host media. There are various applications for digital watermarks which include authentication, finger printing and digital rights enforcement. This implies that the watermark should be tolerant to image processing and lossy compression type of operations. Most standard watermarking techniques do not survive wavelet based compression and may also not be compatible with the scalability feature of wavelet based compression. We present a novel digital watermarking scheme which can successfully withstand wavelet based compression, as well as standard watermark attacks. Our technique is designed to work along side the SNR scalable transmission feature provided with most wavelet compression suites so that the watermark may be authenticated at any level of SNR transmission. Experimental results show that our proposed watermarking method performs better than existing techniques when the host data is compressed using wavelet transforms.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124255564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"R-Flow: An Extensible XML Based Multimodal Dialog System Architecture","authors":"Li Li, Quanzhi Li, W. Chou, Feng Liu","doi":"10.1109/MMSP.2007.4412824","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412824","url":null,"abstract":"This paper presents an approach for an extensible multimodal interaction dialogue system, R-Flow, based on a recursive application of Model-View-C'ontroller (MVC) design patterns to derive system components and interfaces. This approach leads to a clear separation of three self-contained functional layers in a multimodal dialogue system: modality independent dialog control, synchronization of logical modalities, and physical presentation. These layers are codified and weaved together through standard based XML languages. In particular, it utilizes the standard State-Chart XML (SCXML) for dialog control, SMIL and EMMA based XM-Flow for modality synchronization and interpretation, and a generic XML based binding mechanism to map logical modalities to physical presentations. A prototype system has been implemented for multimodal (e.g. speech, text, and mouse) manipulation of Google map. Our experimental results indicated that such layered and component-based XML MMI system is feasible and the performance of such MMI system is studied and measured.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"62 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116601639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}