Yusuke Hioka, K. Niwa, Sumitaka Sakauchi, K. Furuya, Y. Haneda
{"title":"Estimating direct-to-reverberant energy ratio based on spatial correlation model segregating direct sound and reverberation","authors":"Yusuke Hioka, K. Niwa, Sumitaka Sakauchi, K. Furuya, Y. Haneda","doi":"10.1109/ICASSP.2010.5496103","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5496103","url":null,"abstract":"A new approach for estimating the direct-to-reverberant energy ratio (DRR) using a microphone array is proposed. The method is based on amodel of a spatial correlation matrix that segregates direct sound and reverberation. It estimates DRR from the power spectra of both components, which are derived from the correlation matrix of the observed signal. In experiments performed in simulated and actual reverberant environments, the proposed method mostly succeeded in estimating DRR accurately. We also present speech enhancement using binary masking as an example of an application of the estimated DRR. By utilization of the DRR as a factor to discriminate the distances of speakers, separation of speech signals whose sources were located in the same direction but at different distances was achieved.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129627755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Eiwen, G. Tauböck, F. Hlawatsch, H. Rauhut, N. Czink
{"title":"Multichannel-compressive estimation of doubly selective channels in MIMO-OFDM systems: Exploiting and enhancing joint sparsity","authors":"Daniel Eiwen, G. Tauböck, F. Hlawatsch, H. Rauhut, N. Czink","doi":"10.1109/ICASSP.2010.5496098","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5496098","url":null,"abstract":"We propose a compressive estimator of doubly selective channels within pulse-shaping multicarrier MIMO systems (including MIMO-OFDM as a special case). The use of multichannel compressed sensing exploits the joint sparsity of the MIMO channel for improved performance. We also propose a multichannel basis optimization for enhancing joint sparsity. Simulation results demonstrate significant advantages over channel-by-channel compressive estimation.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128260049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tone injection with aggressive clipping projection for OFDM PAPR reduction","authors":"Cagdas Tuna, Douglas L. Jones","doi":"10.1109/ICASSP.2010.5496028","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5496028","url":null,"abstract":"The main drawback of Orthogonal Frequency Division Multiplexing (OFDM) systems is the high peak-to-average power ratio (PAPR), which leads to a significant reduction in performance and power efficiency. Tone injection (TI) is a promising PAPR reduction technique that cyclically extends QAM constellations to allow an alternative encoding with lower PAPR at the transmitter. We present a new efficient complex-baseband algorithm which performs TI by using aggressive clipping to combat large peaks. An approximated-analog PAPR reduction of up to 5.1 dB at a 10–5 symbol-clip probability is obtained for 64-channel 16-QAM OFDM. This is a very fast and practical peak-power reduction method for OFDM systems that essentially achieves the same PAPR as single-carrier modulation for large constellations.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128961774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A scalable H.264/AVC deblocking filter architecture using dynamic partial reconfiguration","authors":"Rakan Khraisha, Jooheung Lee","doi":"10.1109/ICASSP.2010.5495525","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495525","url":null,"abstract":"This paper presents a scalable H.264/AVC deblocking filter architecture based on FPGA using dynamic partial reconfiguration. This desirable feature of FPGAs makes it possible for different hardware configurations to be implemented during run-time. Architectural scalability to adapt to different users' requirements intelligently is demonstrated through dynamic self-reconfiguration on the reconfigurable hardware fabric. When exploiting the full capability of the proposed design, filtering operations up to four different edges at the same time can be performed resulting in significant reduction of total processing time. The architecture can easily support the required computing capability for different resolutions and frame rates of video sequences. The implemented architecture has been evaluated using Xilinx Virtex-4 ML410 FPGA board. The design can operate at a maximum frequency of 103 MHz. The reconfiguration is done through Internal Configuration Access Port (ICAP) to achieve maximum performance needed by real time applications.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129248583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A scalable block cipher design using filter banks over finite fields","authors":"S. Saraireh, M. Benaissa","doi":"10.1109/ICASSP.2010.5495404","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495404","url":null,"abstract":"A scalable block cipher based on a filter bank structure over GF(28) is proposed. The filter bank structure is used to introduce the diffusion during the circular convolution process between the filters coefficients (which are generated from the key) and the plaintext. The confusion is achieved by the mixing between the analysis filter bank and a novel addition mod 2n and XOR scheme. The proposed cipher is scalable in both block and key lengths. The cipher is shown to be secure against differential and linear cryptanalysis and of lesser complexity than the AES. The proposed cipher structure enables security versus complexity versus performance trade-offs to be made, an increasingly important aspect of security in communications systems.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124598979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reduced-rank DOA estimation based on joint iterative subspace recursive optimization and grid search","authors":"Lei Wang, R. D. Lamare, M. Haardt","doi":"10.1109/ICASSP.2010.5496256","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5496256","url":null,"abstract":"In this paper, we propose a reduced-rank direction of arrival (DOA) estimation algorithm based on joint and iterative subspace optimization (JISO) with grid search . The reduced-rank scheme includes a rank reduction matrix and an auxiliary reduced-rank parameter vector. They are jointly and iteratively optimized with a recursive least squares algorithm (RLS) to calculate the output power spectrum. The proposed JISO-RLS DOA estimation algorithm provides an efficient way to iteratively estimate the rank reduction matrix and the auxiliary reduced-rank vector. It is suitable for DOA estimation with large arrays and can be extended to arbitrary array geometries. It exhibits an advantage over MUSIC and ESPRIT when many sources exist in the system. A spatial smoothing (SS) technique is employed for dealing with highly correlated sources. Simulation results show that the JISO-RLS has a better performance than existing Capon and subspace-based DOA estimation methods.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124669641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parametric emotional singing voice synthesis","authors":"Younsung Park, Sungrack Yun, C. Yoo","doi":"10.1109/ICASSP.2010.5495137","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495137","url":null,"abstract":"This paper describes an algorithm to control the expressed emotion of a synthesized song. Based on the database of various melodies sung neutrally with restricted set of words, hidden semi-Markov models (HSMMs) of notes ranging from E3 to G5 are constructed for synthesizing singing voice. Three steps are taken in the synthesis: (1) Pitch and duration are determined according to the notes indicated by the musical score; (2) Features are sampled from appropriate HSMMs with the duration set to the maximum probability; (3) Singing voice is synthesized by the mel-log spectrum approximation (MLSA) filter using the sampled features as parameters of the filter. Emotion of a synthesized song is controlled by varying the duration and the vibrato parameters according to the Thayer's mood model. Perception test is performed to evaluate the synthesized song. The results show that the algorithm can control the expressed emotion of a singing voice given a neutral singing voice database.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124698608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kotaro Ogino, T. Jitsuhiro, C. Miyajima, K. Takeda
{"title":"Analyzing grasping for inferring cognitive states of users","authors":"Kotaro Ogino, T. Jitsuhiro, C. Miyajima, K. Takeda","doi":"10.1109/ICASSP.2010.5495795","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495795","url":null,"abstract":"We study the effect of cognitive states, feelings about tasks, on grasping behavior to estimate user's feelings from their motion. Since people solve the inverse kinematics problem of grasping based on their cognition for the task, when they grasp an object, the way to grasp the object reflects their cognitive states. We are analyzing the way of grasping a cup depending on whether a user is stressed. The physical properties of grasping, volume and entropy of Grasp Jacobian ellipsoids are analyzed. The volume of Grasp Jacobian ellipsoids, which indicates the possible size of object movement, was shrunk after learning the grasp motion. Also the volumes between the relaxed and the stressed cognitive conditions were significantly different. These results show that the user's cognition for tasks reflects the grasp forms and the possible size of object movement.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124757642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive score fusion using Weighted Logistic Linear Regression for spoken language recognition","authors":"K. Sim, Kong-Aik Lee","doi":"10.1109/ICASSP.2010.5495069","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495069","url":null,"abstract":"State-of-the-art spoken language recognition systems typically consist of a combination of sub-systems. These sub-systems generate language detection scores for each speech segment, which will be fused (combined) to yield the overall detection scores. Typically, score fusion is achieved using a linear model and Logistic Linear Regression (LLR) is commonly used to estimate the model parameters. This paper proposes an extension to the LLR model, known as the Weighted LLR (WLLR). WLLR is obtained using a weighted combination of multiple LLRs where the weights are obtained as a nonlinear function of the speech segments. Although the resultant score is still linear with respect to the scores of the individual sub-systems, the linear function depends on the speech segment. Hence, the overall score fusion model can be regarded as an adaptive model. Experimental results shows that WLLR outperforms LLR by approximately 10% relative for PPRLM system fusion on the NIST 2003 and 2005 language recognition evaluation sets.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129481316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Youssef Souissi, S. Guilley, J. Danger, S. Mekki, Guillaume Duc
{"title":"Improvement of power analysis attacks using Kalman filter","authors":"Youssef Souissi, S. Guilley, J. Danger, S. Mekki, Guillaume Duc","doi":"10.1109/ICASSP.2010.5495428","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495428","url":null,"abstract":"Power analysis attacks are non intrusive and easily mounted. As a consequence, there is a growing interest in efficient implementation of these attacks against block cipher algorithms such as Data Encryption Standard (DES) and Advanced Encryption Standard (AES). In our paper we propose a new technique based on the Kalman theory. We show how this technique could be useful for the cryptographic domain by making power analysis attacks faster. Moreover we prove that the Kalman filter is more powerful than the High Order Statistics technique.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129488941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}