{"title":"Multi-task learning in deep neural networks for improved phoneme recognition","authors":"M. Seltzer, J. Droppo","doi":"10.1109/ICASSP.2013.6639012","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6639012","url":null,"abstract":"In this paper we demonstrate how to improve the performance of deep neural network (DNN) acoustic models using multi-task learning. In multi-task learning, the network is trained to perform both the primary classification task and one or more secondary tasks using a shared representation. The additional model parameters associated with the secondary tasks represent a very small increase in the number of trained parameters, and can be discarded at runtime. In this paper, we explore three natural choices for the secondary task: the phone label, the phone context, and the state context. We demonstrate that, even on a strong baseline, multi-task learning can provide a significant decrease in error rate. Using phone context, the phonetic error rate (PER) on TIMIT is reduced from 21.63% to 20.25% on the core test set, and surpassing the best performance in the literature for a DNN that uses a standard feed-forward network architecture.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124679777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large-scale malware classification using random projections and neural networks","authors":"George E. Dahl, J. W. Stokes, L. Deng, Dong Yu","doi":"10.1109/ICASSP.2013.6638293","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638293","url":null,"abstract":"Automatically generated malware is a significant problem for computer users. Analysts are able to manually investigate a small number of unknown files, but the best large-scale defense for detecting malware is automated malware classification. Malware classifiers often use sparse binary features, and the number of potential features can be on the order of tens or hundreds of millions. Feature selection reduces the number of features to a manageable number for training simpler algorithms such as logistic regression, but this number is still too large for more complex algorithms such as neural networks. To overcome this problem, we used random projections to further reduce the dimensionality of the original input space. Using this architecture, we train several very large-scale neural network systems with over 2.6 million labeled samples thereby achieving classification results with a two-class error rate of 0.49% for a single neural network and 0.42% for an ensemble of neural networks.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124707348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved estimation of EEG evoked potentials by jitter compensation and enhancing spatial filters","authors":"A. Souloumiac, B. Rivet","doi":"10.1109/ICASSP.2013.6637845","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6637845","url":null,"abstract":"We propose in this paper a new technique to investigate the Event-Related Potentials, or Evoked-Response Potentials, in the electroencephalographic signal. The multidimensional electroencephalographic signal is first spatially filtered to enhance the Evoked-Response Potentials using the xDAWN algorithm and, second, the single trial latencies (whatever their origins: physiological or electronical) are estimated by maximizing a cross correlation without any a priori model. The performance of this approach is illustrated on two classical P300-Speller electroencephalographic databases (BCI Competition II and III). The single-trial distribution of P300 Evoked-Response Potential is deblurred using the proposed resynchronization algorithm for applications in particular to Brain Computer Interfaces.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124716163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cooperative spectrum sharing with joint receiver decoding","authors":"Songze Li, U. Mitra, A. Pandharipande","doi":"10.1109/ICASSP.2013.6638674","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638674","url":null,"abstract":"We consider a spectrum sharing protocol wherein the primary and secondary transmitters cooperatively relay each other's message. Transmission is done in two phases, with each transmitter attempting to decode messages from the other system transmission in a first phase. The second phase transmission consists of the decoded message superposed onto its own message. Priority is given to the primary system transmissions by having the primary message always transmitted over the two phases, while the secondary message is transmitted depending on successful decoding. We consider the scenario where the primary and secondary receivers are co-located, forming a virtual two-antenna receiver. We assess the performance of the system in terms of outage probability and characterize performance corresponding to each state of the Markov chain that governs the proposed transmission protocol. We show that joint decoding offers a 20 dB performance improvement over separate decoding for the primary user and 1.8 dB for the secondary user.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"178 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124744432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Backwards-compatible error propagation recovery for the amr codec over erasure channels","authors":"A. Gómez, J. L. Pérez-Córdoba, B. Geiser","doi":"10.1109/ICASSP.2013.6639258","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6639258","url":null,"abstract":"This paper presents a recovery scheme for the error-propagation distortion which frequently appears after a frame erasure in CELP-based speech coders, in particular the AMR codec. The extensive use of predictive filters and parameter encoding allow a high-quality speech synthesis in these codecs, but makes them more vulnerable to frame erasures. Thus, when a frame is lost, an additional distortion appears in the subsequent frame, although that was correctly received, further degrading the speech quality. This degradation can also propagate over several frames, being even more damaging than the loss itself. This well known fact has motivated the development of techniques which prevent or mitigate the error propagation. Nevertheless, the previously proposed methods in some respect modify the transmission scheme (by including additional frames, FEC codes, etc.) making them incompatible with the original decoder. In this work, we apply a steganographic technique to embed recovery data to assist the decoder after a frame loss. This data mainly consist of resynchronization pulses and correction vectors for the excitation signal and the spectral envelope, respectively. PESQ results confirm that our proposal achieves a higher robustness against error propagation while the full backwards-compatibility with the AMR standard is retained.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124785327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Deng, Jinyu Li, J. Huang, K. Yao, Dong Yu, F. Seide, M. Seltzer, G. Zweig, Xiaodong He, J. Williams, Y. Gong, A. Acero
{"title":"Recent advances in deep learning for speech research at Microsoft","authors":"L. Deng, Jinyu Li, J. Huang, K. Yao, Dong Yu, F. Seide, M. Seltzer, G. Zweig, Xiaodong He, J. Williams, Y. Gong, A. Acero","doi":"10.1109/ICASSP.2013.6639345","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6639345","url":null,"abstract":"Deep learning is becoming a mainstream technology for speech recognition at industrial scale. In this paper, we provide an overview of the work by Microsoft speech researchers since 2009 in this area, focusing on more recent advances which shed light to the basic capabilities and limitations of the current deep learning technology. We organize this overview along the feature-domain and model-domain dimensions according to the conventional approach to analyzing speech systems. Selected experimental results, including speech recognition and related applications such as spoken dialogue and language modeling, are presented to demonstrate and analyze the strengths and weaknesses of the techniques described in the paper. Potential improvement of these techniques and future research directions are discussed.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124793502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"“Wow!” Bayesian surprise for salient acoustic event detection","authors":"Boris Schauerte, R. Stiefelhagen","doi":"10.1109/ICASSP.2013.6638898","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638898","url":null,"abstract":"We extend our previous work and present how Bayesian surprise can be applied to detect salient acoustic events. Therefore, we use the Gamma distribution to model each frequencies spectrogram distribution. Then, we use the Kullback-Leibler divergence of the posterior and prior distribution to calculate how “unexpected” and thus surprising newly observed audio samples are. This way, we are able to efficiently detect arbitrary, unexpected and thus surprising acoustic events. Complementing our qualitative system evaluations for (humanoid) robots, we demonstrate the effectiveness and practical applicability of the approach on the CLEAR 2007 acoustic event detection data.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124973361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint source localization and sensor position refinement for sensor networks","authors":"Ming Sun, Zhenhua Ma, K. C. Ho","doi":"10.1109/ICASSP.2013.6638415","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638415","url":null,"abstract":"Modern localization systems/platforms such as sensor networks often experience uncertainty in the sensor positions. Improving the sensor positions is necessary in order to achieve better localization performance. This paper proposes a joint estimator for locating multiple unknown sources and refining the sensor positions using TOA measurements. Rather than resorting to the traditional iterative nonlinear least-squares approach that requires careful initializations, the proposed estimator is algebraic and computationally attractive. The small noise analysis shows that the proposed estimator is able to attain the CRLB performance for both the unknown sources and the sensor positions. Simulations support the efficiency of the proposed estimator.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129439871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interactive fusion in distributed detection: Architecture and performance analysis","authors":"E. Akofor, Biao Chen","doi":"10.1109/ICASSP.2013.6638463","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638463","url":null,"abstract":"Within the Neyman-Pearson framework we investigate the effect of feedback in two-sensor tandem fusion networks with conditionally independent observations. While there is noticeable improvement in performance of the fixed sample size Neyman-Pearson (NP) test, it is shown that feedback has no effect on the asymptotic performance characterized by the Kullback-Leibler (KL) distance. The result can be extended to an interactive fusion system where the fusion center and the sensor may undergo multiple steps of interactions.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129466619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal counterforensics for histogram-based forensics","authors":"Pedro Comesaña Alfaro, F. Pérez-González","doi":"10.1109/ICASSP.2013.6638218","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638218","url":null,"abstract":"There has been a recent interest in counterforensics as an adversarial approach to forensic detectors. Most of the existing counterforensics strategies, although successful, are based on heuristic criteria, and their optimality is not proven. In this paper the optimal modification strategy of a content in order to fool a histogram-based forensics detector is derived. The proposed attack relies on the assumption of a convex cost function; special attention is paid to the Euclidean norm, obtaining the optimal attack in the MSE sense. In order to prove the usefulness of the proposed strategy, we employ it to successfully attack a well-known algorithm for detecting double JPEG compression.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"336 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129486588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}