M. Najafian, Sameer Khurana, Suwon Shon, Ahmed Ali, James R. Glass
{"title":"Exploiting Convolutional Neural Networks for Phonotactic Based Dialect Identification","authors":"M. Najafian, Sameer Khurana, Suwon Shon, Ahmed Ali, James R. Glass","doi":"10.1109/ICASSP.2018.8461486","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461486","url":null,"abstract":"In this paper, we investigate different approaches for Dialect Identification (DID) in Arabic broadcast speech. Dialects differ in their inventory of phonological segments. This paper proposes a new phonotactic based feature representation approach which enables discrimination among different occurrences of the same phone n-grams with different phone duration and probability statistics. To achieve further gain in accuracy we used multi-lingual phone recognizers, trained separately on Arabic, English, Czech, Hungarian and Russian languages. We use Support Vector Machines (SVMs), and Convolutional Neural Networks (CNN s) as backend classifiers throughout the study. The final system fusion results in 24.7% and 19.0% relative error rate reduction compared to that of a conventional phonotactic DID, and i-vectors with bottleneck features.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"7 1","pages":"5174-5178"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82750562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Stem Reu Site on the Integrated Design of Sensor Devices and Signal Processing Algorithms","authors":"A. Spanias, J. Christen","doi":"10.1109/ICASSP.2018.8462483","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462483","url":null,"abstract":"Arizona State University (ASU) established an NSF Research Experiences for Undergraduates (REU) site to embed students in research projects related to integrated sensor and signal processing systems. The program includes both sensor hardware and algorithm/software design for a variety of applications including health monitoring. The site was funded in February 2017 and the Co-PIs recruited nine students from different universities and community colleges to spend the summer of 2017 in research laboratories at ASU. The program included structured training with modules in sensor design, signal processing, and machine learning. Cross-cutting training included research ethics, IEEE manuscript development, and building presentation skills. Nine undergraduate research projects were launched and the program went through an assessment by an independent evaluator. This paper describes the REU activities, modules, training, projects, and their assessment.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"73 1","pages":"6991-6995"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91121650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hiroshi Seki, Shinji Watanabe, Takaaki Hori, Jonathan Le Roux, J. Hershey
{"title":"An End-to-End Language-Tracking Speech Recognizer for Mixed-Language Speech","authors":"Hiroshi Seki, Shinji Watanabe, Takaaki Hori, Jonathan Le Roux, J. Hershey","doi":"10.1109/ICASSP.2018.8462180","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462180","url":null,"abstract":"End-to-end automatic speech recognition (ASR) can significantly reduce the burden of developing ASR systems for new languages, by eliminating the need for linguistic information such as pronunciation dictionaries. This also creates an opportunity to build a monolithic multilingual ASR system with a language-independent neural network architecture. In our previous work, we proposed a monolithic neural network architecture that can recognize multiple languages, and showed its effectiveness compared with conventional language-dependent models. However, the model is not guaranteed to properly handle switches in language within an utterance, thus lacking the flexibility to recognize mixed-language speech such as code-switching. In this paper, we extend our model to enable dynamic tracking of the language within an utterance, and propose a training procedure that takes advantage of a newly created mixed-language speech corpus. Experimental results show that the extended model outperforms both language-dependent models and our previous model without suffering from performance degradation that could be associated with language switching.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"54 1","pages":"4919-4923"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80791050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kenji Kobayashi, Daiki Takeuchi, Mio Iwamoto, K. Yatabe, Yasuhiro Oikawa
{"title":"Parametric Approximation of Piano Sound Based on Kautz Model with Sparse Linear Prediction","authors":"Kenji Kobayashi, Daiki Takeuchi, Mio Iwamoto, K. Yatabe, Yasuhiro Oikawa","doi":"10.1109/ICASSP.2018.8461547","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461547","url":null,"abstract":"The piano is one of the most popular and attractive musical instruments that leads to a lot of research on it. To synthesize the piano sound in a computer, many modeling methods have been proposed from full physical models to approximated models. The focus of this paper is on the latter, approximating piano sound by an IIR filter. For stably estimating parameters, the Kautz model is chosen as the filter structure. Then, the selection of poles and excitation signal rises as the questions which are typical to the Kautz model that must be solved. In this paper, sparsity based construction of the Kautz model is proposed for approximating piano sound.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"39 1","pages":"626-630"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84635851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Erich Kobler, Matthew Muckley, Baiyu Chen, F. Knoll, K. Hammernik, T. Pock, D. Sodickson, R. Otazo
{"title":"Variational Deep Learning for Low-Dose Computed Tomography","authors":"Erich Kobler, Matthew Muckley, Baiyu Chen, F. Knoll, K. Hammernik, T. Pock, D. Sodickson, R. Otazo","doi":"10.1109/ICASSP.2018.8462312","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462312","url":null,"abstract":"In this work, we propose a learning-based variational network (VN) approach for reconstruction of low-dose 3D computed tomography data. We focus on two methods to decrease the radiation dose: (1) x-ray tube current reduction, which reduces the signal-to-noise ratio, and (2) x-ray beam interruption, which undersamples data and results in images with aliasing artifacts. While the learned VN denoises the current-reduced images in the first case, it reconstructs the undersampled data in the second case. Different VNs for denoising and reconstruction are trained on a single clinical 3D abdominal data set. The VNs are compared against state-of-the-art model-based denoising and sparse reconstruction techniques on a different clinical abdominal 3D data set with 4-fold dose reduction. Our results suggest that the proposed VNs enable higher radiation dose reductions and/or increase the image quality for a given dose.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"35 1","pages":"6687-6691"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91252897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Envelope Estimation by Tangentially Constrained Spline","authors":"Tsubasa Kusano, K. Yatabe, Yasuhiro Oikawa","doi":"10.1109/ICASSP.2018.8462203","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462203","url":null,"abstract":"Estimating envelope of a signal has various applications including empirical mode decomposition (EMD) in which the cubic $C^{2}$ -spline based envelope estimation is generally used. While such functional approach can easily control smoothness of an estimated envelope, the so-called undershoot problem often occurs that violates the basic requirement of envelope. In this paper, a tangentially constrained spline with tangential points optimization is proposed for avoiding the undershoot problem while maintaining smoothness. It is defined as a quartic $C^{2}$ -spline function constrained with first derivatives at tangential points that effectively avoids undershoot. The tangential points optimization method is proposed in combination with this spline to attain optimal smoothness of the estimated envelope.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"6 1","pages":"4374-4378"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83410798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Music Transcription Leveraging Generalized Cepstral Features and Deep Learning","authors":"Yu-Te Wu, Berlin Chen, Li Su","doi":"10.1109/ICASSP.2018.8462079","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462079","url":null,"abstract":"Spectral features are limited in modeling musical signals with multiple concurrent pitches due to the challenge to suppress the interference of the harmonic peaks from one pitch to another. In this paper, we show that using multiple features represented in both the frequency and time domains with deep learning modeling can reduce such interference. These features are derived systematically from conventional pitch detection functions that relate to one another through the discrete Fourier transform and a nonlinear scaling function. Neural networks modeled with these features outperform state-of-the-art methods while using less training data.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"6 1","pages":"401-405"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82007412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raksha Ramakrishna, A. Bernstein, E. Dall’Anese, A. Scaglione
{"title":"Joint Probabilistic Forecasts of Temperature and Solar Irradiance","authors":"Raksha Ramakrishna, A. Bernstein, E. Dall’Anese, A. Scaglione","doi":"10.1109/ICASSP.2018.8462496","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462496","url":null,"abstract":"In this paper, a mathematical relationship between temperature and solar irradiance is established in order to reduce the sample space and provide joint probabilistic forecasts. These forecasts can then be used for the purpose of stochastic optimization in power systems. A Volterra system type of model is derived to characterize the dependence of temperature on solar irradiance. A dataset from NOAA weather station in California is used to validate the fit of the model. Using the model, probabilistic forecasts of both temperature and irradiance are provided and the performance of the forecasting technique highlights the efficacy of the proposed approach. Results are indicative of the fact that the underlying correlation between temperature and irradiance is well captured and will therefore be useful to produce future scenarios of temperature and irradiance while approximating the underlying sample space appropriately.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"90 1","pages":"3819-3823"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78432381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Bayesian Channel Gain Cartography","authors":"Donghoon Lee, Dimitris Berberidis, G. Giannakis","doi":"10.1109/ICASSP.2018.8461412","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461412","url":null,"abstract":"Channel gain cartography relies on sensor measurements to construct maps providing the attenuation profile between arbitrary transmitter-receiver locations. Existing approaches capitalize on tomographic models, where shadowing is the weighted integral of a spatial loss field (SLF) depending on the propagation environment. Currently, the SLF is learned via regularization methods tailored to the propagation environment. However, the effectiveness of existing approaches remains unclear especially when the propagation environment involves heterogeneous characteristics. To cope with this, the present work considers a piecewise homogeneous SLF with a hidden Markov random field (MRF) model under the Bayesian framework. Efficient field estimators are obtained by using samples from Markov chain Monte Carlo (MCMC). Furthermore, an uncertainty sampling algorithm is developed to adaptively collect measurements. Real data tests demonstrate the capabilities of the novel approach.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"73 1","pages":"3554-3558"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74519068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yoshiki Masuyama, Tsubasa Kusano, K. Yatabe, Yasuhiro Oikawa
{"title":"Modal Decomposition of Musical Instrument Sound Via Alternating Direction Method of Multipliers","authors":"Yoshiki Masuyama, Tsubasa Kusano, K. Yatabe, Yasuhiro Oikawa","doi":"10.1109/ICASSP.2018.8462350","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462350","url":null,"abstract":"For a musical instrument sound containing partials, or modes, the behavior of modes around the attack time is particularly important. However, accurately decomposing it around the attack time is not an easy task, especially when the onset is sharp. This is because spectra of the modes are peaky while the sharp onsets need a broad one. In this paper, an optimization-based method of modal decomposition is proposed to achieve accurate decomposition around the attack time. The proposed method is formulated as a constrained optimization problem to enforce the perfect reconstruction property which is important for accurate decomposition. For optimization, the alternating direction method of multipliers (ADMM) is utilized, where the update of variables is calculated in closed form. The proposed method realizes accurate modal decomposition in the simulation and real piano sounds.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"6 1","pages":"631-635"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89046970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}