{"title":"Glottal features for speech-based cognitive load classification","authors":"T. Yap, J. Epps, E. Choi, E. Ambikairajah","doi":"10.1109/ICASSP.2010.5494987","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5494987","url":null,"abstract":"Cognitive load measurement is important when designing adaptive interfaces that optimize the performance of users working on high mental load tasks. Recent research on automatic speech-based measurement system indicates that cognitive load information is more prominent in the frequency region below 1 kHz. This study investigates the effects of cognitive load on glottal parameters (open quotient, normalized amplitude quotient and speed quotient), and proposes a system employing these parameters as features for cognitive load classification. Analysis of the glottal parameter distributions suggests that an increase in cognitive load can be related to a more creaky voice quality. Additionally, three-class classification results show that score-level fusion of systems based on the glottal features and baseline features (MFCCs, pitch, intensity and shifted delta cepstra) improves the baseline accuracy from 79% to 84%.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123263353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Particle filtering based recovery of noisy GARCH processes","authors":"T. Michaeli, I. Cohen","doi":"10.1109/ICASSP.2010.5495789","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495789","url":null,"abstract":"In this paper, we address the problem of enhancement of a noisy GARCH process using a particle filter. We compare our approach experimentally to a previously developed recursive estimation scheme. Simulations indicate that a significant gain in performance is obtained, at the cost of higher sensitivity to errors in the GARCH parameters. The proposed method allows tackling arbitrary driving noise distributions as well as arbitrary fidelity criteria.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123451612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image retargeting using a bandelet-based similarity measure","authors":"A. Maalouf, M. Larabi","doi":"10.1109/ICASSP.2010.5495291","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495291","url":null,"abstract":"Media content retargeting aims to adapt images/ videos to displays of large or small sizes. In this work, we propose a bandelet-based image retargeting algorithm for summarizing image data into smaller sizes. First, we define a multi-scale bandelet-based perceptual similarity measure which measures the geometric and perceptual similarities between two images at different bandelet scales. Two images are said to be geometrically similar if they have approximately the same geometric flow and quadtree structure. After determining the geometric similarity, a perceptual similarity measure based on the properties of the human visual system is defined to assess the perceptual difference between the original image and the retargeted one. Then, the problem of image retargeting is considered as a geometric optimization problem based on the bandelet-based geometric and perceptual similarity measures. That is, for an image S we search for a retargeted image T that contains as much as possible of geometric and perceptual information from S and, consequently, preserves visual coherence. The proposed retargeting algorithm outperforms the state-of-the-art methods in terms of the visual quality of the retargeted image.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123550466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A small dodecahedral microphone array for blind source separation","authors":"Motoki Ogasawara, Takanori Nishino, K. Takeda","doi":"10.1109/ICASSP.2010.5496003","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5496003","url":null,"abstract":"A sound source separation method based on frequency-domain independent component analysis (FD-ICA) is proposed. This method fully utilizes the dodecahedral microphone array (DHMA), which has several merits: 1) the size of the array is very small and thus easy to handle; 2) the amplitude difference among microphones on the different surfaces is large; and 3) it is less affected by spatial aliasing in the higher frequency region. In the proposed method, in order to solve the permutation problem in FD-ICA through clustering acoustic transfer functions, amplitude and phase differences are optimally combined as a function of frequency. A DHMA of 8 cm in diameter with 60 microphones is used for the experiment, where up to twelve sound sources (speech/musical instruments) are separated using the proposed algorithm. The separation performance of the proposed method attains 24 dB in the signal-to-interference ratio (SIR) improvement score for the case of twelve sources. Since the performance is better by up to 10 dB in comparison to the conventional method, our results confirm the effectiveness of the proposed method.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125279815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Circulant space-time codes for integration with beamforming","authors":"Yiyue Wu, A. Calderbank","doi":"10.1109/ICASSP.2010.5496288","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5496288","url":null,"abstract":"This paper provides a framework for designing space-time codes to take advantage of a small number of feedback bits from the receiver. The new codes are based on circulant matrices and simple conditions are derived that guarantee full rate and full diversity. In the absence of feedback, Symbol Error Rate (SER) performance is shown to be similar to that of Diagonal Algebraic Space-Time (DAST) codes, both for Maximum Likelihood (ML) decoding and for suboptimal linear decoding. Decoding complexity of circulant codes is similar to the DAST codes and encoding is slightly less complex. In the presence of a small number of feedback bits from the receiver the circulant construction is shown to permit integration of space-time coding with a fixed set of beams by simply advancing the phase on one of the antennas. This integration is not possible within the DAST framework. Integration of space-time codes with beamforming makes it possible to achieve ML decoding performance with only linear decoding complexity or to improve upon ML performance of the original code.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125432388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new Fractional Fourier Transform based monopulse tracking radar processor","authors":"S. Elgamel, J. Soraghan","doi":"10.1109/ICASSP.2010.5496208","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5496208","url":null,"abstract":"Conventional monopulse radar processors are used to track a target that appears in the look direction beam width. The distortion produced when additional targets appear in the look direction beam width can cause severe erroneous outcomes from the monopulse processor. This leads to errors in the target tracking angles that may cause the target tracker to fail. A new signal processing algorithm is presented in this paper that is based on the use of optimal Fractional Fourier Transform (FrFT) filtering to solve this problem. The relative performance of the new filtering method over traditional based methods is assessed using standard deviation angle estimation error (STDAE) for a range of simulated environments. The proposed system configurations with the optimum FrFT filters succeeds in effectively cancelling additional target signals appearing in the look direction beam width.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126946480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Development of digital watermarking application technologies for newspapers","authors":"R. Ebisawa, Takaaki Yamada","doi":"10.1109/ICASSP.2010.5495431","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495431","url":null,"abstract":"Newspapers, no different from other copyrighted works, are protected by the copyright laws. However, as the duplication itself is technically executable without any license agreement, in cases where users are ignorant of the copyright laws or have no intent to honor them, copies of newspapers end up to be illegally produced. In this paper, we report experiments for applying digital watermarking to newspapers. Digital watermarking is utilized to realize copyright management system that prevents illegal duplications while offering great convenience. Watermarking application is achieved in which the embedding artifacts are hardly noticeable and the embedded information extraction is stable.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115013598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sai Ma, Xi-Lin Li, N. Correa, T. Adalı, V. Calhoun
{"title":"Independent subspace analysis with prior information for fMRI data","authors":"Sai Ma, Xi-Lin Li, N. Correa, T. Adalı, V. Calhoun","doi":"10.1109/ICASSP.2010.5495320","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495320","url":null,"abstract":"Independent component analysis (ICA) has been successfully applied for the analysis of functional magnetic resonance imaging (fMRI) data. However, independence might be too strong a constraint for certain sources. In this paper, we present an independent subspace analysis (ISA) framework that forms independent subspaces among the estimated sources having dependencies by a hierarchial clustering approach and subsequently separates the dependent sources in the task-related subspace using prior information. We study the incorporation of two types of prior information to transform the sources within the task-related subspace: sparsity and task-related time courses. We demonstrate the effectiveness of our proposed method for source separation of multi-subject fMRI data from a visuomotor task. Our results show that physiologically meaningful dependencies among sources can be identified using our subspace approach and the dependent estimated components can be further separated effectively using a subsequent transformation.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116027927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new approach to cross-layer optimization of multimedia systems","authors":"Nicholas Mastronarde, M. Schaar","doi":"10.1109/ICASSP.2010.5495995","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495995","url":null,"abstract":"In recent years, cross-layer multimedia system design and optimization has garnered significant attention; however, there is no existing rigorous methodology for optimizing two or more system layers (e.g. the application, operating system, and hardware layers) jointly while maintaining a separation among the decision processes of each layer. Moreover, existing work often relies on myopic optimizations, which ignore the impact of decisions made at the current time on the system's future performance. In this paper, we propose a novel systematic framework for jointly optimizing the different system layers to improve the performance of one multimedia application. In particular, we model the system as a layered Markov Decision Process (MDP), which enables each layer to make autonomous and foresighted decisions that optimize the system's long-term performance.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"201 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116386824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using duration and pitch for mandarin digit string recognition","authors":"Rui Zhao, Yusuke Kida, X. Yan, P. Ding, Lei He","doi":"10.1109/ICASSP.2010.5495128","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495128","url":null,"abstract":"Mandarin digit string recognition (MDSR) is a challenge because there exist many difficulties in acoustic discrimination for such a small vocabulary speech recognition task. In this paper, we propose to improve MDSR performance by using duration and pitch information. Speech rate dispersion is used to involve duration knowledge and is incorporated in the MDSR system by rescoring the N-best candidates in a two-pass framework. estimated with a robust pitch extraction method is also adopted to improve the acoustic discrimination among Mandarin digits. The experimental results show both duration and pitch significantly improve the performance, and the combination of them gives further improvement. Moreover, our methods are robust to background noise. In the evaluation, the sentence error rate is reduced by 50.43% on average over different SNR conditions.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122695931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}