{"title":"Analysis of singing voice for epoch extraction using Zero Frequency Filtering method","authors":"Sudarsana Reddy Kadiri, B. Yegnanarayana","doi":"10.1109/ICASSP.2015.7178774","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178774","url":null,"abstract":"Epoch is the instant of significant excitation of the vocal tract system during the production of voiced speech. Estimation of epochs or Glottal closure instants (GCIs) is a well studied topic in the speech analysis. From the recent studies on GCI detection from singing voice with state-of-art methods proposed for speech, there exist a clear gap in accuracy between speech and singing voice. This is because of source-filter interaction in singing voice compared to speech. Performance of existing algorithms deteriorates as most of the techniques depends on the ability to model the vocal tract system in order to emphasize the excitation characteristics in the residual. The objective of this paper is to analyze the singing voice for the estimation of epochs by studying the characteristics of the source-filter interaction and the effect of wider range of pitch using the Zero Frequency Filtering (ZFF) method. It is observed that high source-filter interaction can be captured in the form of the impulse-like excitation by passing the signal through three ideal digital resonators having poles at zero frequency, and the effect of wider range of pitch can be controlled by processing short segment (0.4-0.5 sec) signal.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121468375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Information extraction from large multi-layer social networks","authors":"Brandon Oselio, Alex Kulesza, A. Hero","doi":"10.1109/ICASSP.2015.7179013","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7179013","url":null,"abstract":"Social networks often encode community structure using multiple distinct types of links between nodes. In this paper we introduce a novel method to extract information from such multi-layer networks, where each type of link forms its own layer. Using the concept of Pareto optimality, community detection in this multi-layer setting is formulated as a multiple criterion optimization problem. We propose an algorithm for finding an approximate Pareto frontier containing a family of solutions. The power of this approach is demonstrated on a Twitter dataset, where the nodes are hashtags and the layers correspond to (1) behavioral edges connecting pairs of hashtags whose temporal profiles are similar and (2) relational edges connecting pairs of hashtags that appear in the same tweets.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121594268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Chaudhari, Yongjae Yoo, Clemens Schuwerk, Seungmoon Choi, E. Steinbach
{"title":"Objective quality prediction for haptic texture signal compression","authors":"R. Chaudhari, Yongjae Yoo, Clemens Schuwerk, Seungmoon Choi, E. Steinbach","doi":"10.1109/ICASSP.2015.7178366","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178366","url":null,"abstract":"Perceptual quality for media compression algorithms is traditionally evaluated through user studies. Such studies are time consuming, laborious and expensive, slowing down the development of new signal processing algorithms. To address this problem, a number of algorithmic quality prediction methodologies have been developed in the audio and video fields, something that is currently lacking in haptics research. In this paper, we present a novel method for predicting the perceptual quality degradation of compressed haptic texture signals. For this purpose, abstract perceptual features like Roughness, Brightness, etc. that capture the subjective experience of textures are exploited, in addition to low-level psychophysical models from the literature. As compared to the state-of-the-art, the presented prediction methodology shows an approximately 30% improvement in explaining the variance in the perceptual data.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"164 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114736968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast and efficient intra coding techniques for smooth regions in screen content coding based on boundary prediction samples","authors":"Sik-Ho Tsang, Yui-Lam Chan, W. Siu","doi":"10.1109/ICASSP.2015.7178202","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178202","url":null,"abstract":"This paper presents fast and efficient intra prediction algorithms for screen content coding (SCC). The proposed algorithms focus on smooth regions frequently appeared in screen content videos, which have the characteristics of noiselessness. All the samples in a noiseless smooth region exhibit exactly the same pixel value. We then propose two intra coding techniques for noiseless smooth regions in SCC based on the smoothness of the boundary samples which are used for intra prediction. Our proposed algorithm can reduce computational complexity by at most 26.7% while keeping nearly the same video quality. Moreover, by removing the redundant coding bits for intra prediction modes, computational complexity can be further reduced to at most 53.3% in terms of encoding time with bitrate reduction up to 1.2%.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124381255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optically visualized sound field reconstruction based on sparse selection of point sound sources","authors":"K. Yatabe, Yasuhiro Oikawa","doi":"10.1109/ICASSP.2015.7178020","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178020","url":null,"abstract":"Visualization is an effective way to understand the behavior of a sound field. There are several methods for such observation including optical measurement technique which enables a non-destructive acoustical observation by detecting density variation of the medium. For audible sound propagating through the air, however, smallness of the variation requires high sensitivity of the measuring system that causes problematic noise contamination. In this paper, a method for reconstructing two-dimensional audible sound fields from noisy optical observation is proposed.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124388839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Pedrouzo-Ulloa, J. Troncoso-Pastoriza, F. Pérez-González
{"title":"Multivariate lattices for encrypted image processing","authors":"A. Pedrouzo-Ulloa, J. Troncoso-Pastoriza, F. Pérez-González","doi":"10.1109/ICASSP.2015.7178262","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178262","url":null,"abstract":"Images are inherently sensitive signals that require privacy-preserving solutions when processed in an untrusted environment, but their efficient encrypted processing is particularly challenging due to their structure and size. This work introduces a new cryptographic hard problem called m-RLWE (multivariate Ring Learning with Errors) extending RLWE. It gives support to lattice cryptosystems that allow for encrypted processing of multidimensional signals. We show an example cryptosystem and prove that it outperforms its RLWE counterpart in terms of security against basis-reduction attacks, efficiency and cipher expansion for encrypted image processing.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"509 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127603827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiple target track-before-detect in compound Gaussian clutter","authors":"S. P. Ebenezer, A. Papandreou-Suppappola","doi":"10.1109/ICASSP.2015.7178429","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178429","url":null,"abstract":"In this paper, we extend the multiple transition mode track- before-detect (TBD) algorithm to track multiple low observable targets in compound Gaussian sea clutter. The proposed TBD framework uses the un-thresholded fast time radar measurements to track multiple targets in low signal-to-clutter ratios (SCRs). The TBD is implemented using particle filtering (PF), and we derive the generalized likelihood ratio needed to update the particle weights. The maximum likelihood estimate of the texture and the covariance matrix of the speckle are also derived and implemented using a fixed point algorithm. The tracking performance of the proposed algorithm is investigated using three low observable targets that enter and leave the field of view (FOV) at different time steps and under varying environmental conditions.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127717346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingjie Li, I. Mcloughlin, Cong Liu, Shaofei Xue, Si Wei
{"title":"Multi-task deep neural network acoustic models with model adaptation using discriminative speaker identity for whisper recognition","authors":"Jingjie Li, I. Mcloughlin, Cong Liu, Shaofei Xue, Si Wei","doi":"10.1109/ICASSP.2015.7178916","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178916","url":null,"abstract":"This paper presents a study on large vocabulary continuous whisper automatic recognition (wLVCSR). wLVCSR provides the ability to use ASR equipment in public places without concern for disturbing others or leaking private information. However the task of wLVCSR is much more challenging than normal LVCSR due to the absence of pitch which not only causes the signal to noise ratio (SNR) of whispers to be much lower than normal speech but also leads to flatness and formant shifts in whisper spectra. Furthermore, the amount of whisper data available for training is much less than for normal speech. In this paper, multi-task deep neural network (DNN) acoustic models are deployed to solve these problems. Moreover, model adaptation is performed on the multi-task DNN to normalize speaker and environmental variability in whispers based on discriminative speaker identity information. On a Mandarin whisper dictation task, with 55 hours of whisper data, the proposed SI multi-task DNN model can achieve 56.7% character error rate (CER) improvement over a baseline Gaussian Mixture Model (GMM), discriminatively trained only using the whisper data. Besides, the CER of the proposed model for normal speech can reach 15.2%, which is close to the performance of a state-of-the-art DNN trained with one thousand hours of speech data. From this baseline, the model-adapted DNN gains a further 10.9% CER reduction over the generic model.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127737163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Doa estimation by covariance matrix sparse reconstruction of coprime array","authors":"Chengwei Zhou, Zhiguo Shi, Yujie Gu, N. Goodman","doi":"10.1109/ICASSP.2015.7178395","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178395","url":null,"abstract":"In this paper, we propose a direction-of-arrival estimation method by covariance matrix sparse reconstruction of coprime array. Specifically, source locations are estimated by solving a newly formulated convex optimization problem, where the difference between the spatially smoothed covariance matrix and the sparsely reconstructed one is minimized. Then, a sliding window scheme is designed for source enumeration. Finally, the power of each source is re-estimated as a least squares problem. Compared with existing methods, the proposed method achieves more accurate source localization and power estimation performance with full utilization of increased degrees of freedom provided by coprime array.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127754622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Assistive listening headsets for high noise environments: Protection and communication","authors":"S. Nordholm, A. Davis, Pei Chee Yong, H. H. Dam","doi":"10.1109/ICASSP.2015.7179074","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7179074","url":null,"abstract":"In industrial noise environments, the use of assistive listening headsets is a means to provide adequate access to voice communication while wearing hearing protection. This paper presents a performance evaluation and comparison of two different methods to provide the binaural speech enhancement in real industrial noise scenarios. The investigated binaural methods based on differential beamforming and multichannel Wiener filter show different strengths and weaknesses. A transient noise suppression algorithm is also proposed and evaluated. Performance evaluation shows that this algorithm, together with the binaural multi-channel Wiener filter approach, can successfully reduce the hammering noise. This can be observed from the PESQ scores and the signal characteristics.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126277381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}