{"title":"Learning from the Best: A Teacher-student Multilingual Framework for Low-resource Languages","authors":"Deblin Bagchi, William Hartmann","doi":"10.1109/ICASSP.2019.8683491","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683491","url":null,"abstract":"The traditional method of pretraining neural acoustic models in low-resource languages consists of initializing the acoustic model parameters with a large, annotated multilingual corpus and can be a drain on time and resources. In an attempt to reuse TDNN-LSTMs already pre-trained using multilingual training, we have applied Teacher-Student (TS) learning as a method of pretraining to transfer knowledge from a multilingual TDNN-LSTM to a TDNN. The pretraining time is reduced by an order of magnitude with the use of language-specific data during the teacher-student training. Additionally, the TS architecture allows us to leverage untranscribed data, previously untouched during supervised training. The best student TDNN achieves a WER within 1% of the teacher TDNN-LSTM performance and shows consistent improvement in recognition over TDNNs trained using the traditional pipeline over all the evaluation languages. Switching to TDNN from TDNN-LSTM also allows sub-real time decoding.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"6051-6055"},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81681920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anastasios Dimas, Matthew A. Clark, Bo Li, K. Psounis, A. Petropulu
{"title":"On Radar Privacy in Shared Spectrum Scenarios","authors":"Anastasios Dimas, Matthew A. Clark, Bo Li, K. Psounis, A. Petropulu","doi":"10.1109/ICASSP.2019.8682745","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682745","url":null,"abstract":"To satisfy the increasing demand for additional bandwidth from the wireless sector, regulatory bodies are considering to allow commercial wireless systems to operate on spectrum bands that until recently were reserved exclusively for military radar. Such co-existence would require mechanisms for controlling interference. One such mechanism is to assign a precoder to the communication system, which is designed to minimize the communication system’s interference to the radar. This paper looks into whether the implicit radar information contained in such a precoder can be exploited by an adversary to infer the radar’s location. For two specific precoder schemes, we simulate a machine learning based location inference attack. We show that the system information leaked through the precoder can indeed pose various degrees of risk to the radar’s privacy, and further confirm this by computing the mutual information between the respective precoder and the radar location.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"108 1","pages":"7790-7794"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74658721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mónica Ribero, R. Heath, H. Vikalo, D. Chizhik, R. Valenzuela
{"title":"Deep Learning Propagation Models over Irregular Terrain","authors":"Mónica Ribero, R. Heath, H. Vikalo, D. Chizhik, R. Valenzuela","doi":"10.1109/ICASSP.2019.8682491","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682491","url":null,"abstract":"Accurate path gain models are critical for coverage prediction and radio frequency (RF) planning in wireless communications. In many settings irregular terrain induces blockages and scattering making it difficult to predict the path gain. Current solutions are either computationally expensive or slope-intercept fits that do not capture local deviations due to terrain variation, leading to large prediction errors. We propose to use machine learning to learn path gain based on terrain elevation as features. We implement different neural network architectures with dense and convolutional layers that could include effects difficult to describe with traditional models (e.g. back scatter). We test our framework on an extensive set of measured path gain data and consistently predict with 5 dB Root Mean Squared Error, an 8 dB improvement over traditional slope-intercept solutions.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"94 1","pages":"4519-4523"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72784805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improvements to the Matching Projection Decoding Method for Ambisonic System with Irregular Loudspeaker Layouts","authors":"Zhongshu Ge, Xihong Wu, T. Qu","doi":"10.1109/ICASSP.2019.8683105","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683105","url":null,"abstract":"The Ambisonic technique has been widely used for sound field recording and reproduction recently. However, the basic Ambisonic decoding method will break down when the playback loudspeakers distribute unevenly. Various methods have been proposed to solve this problem. This paper introduces several improvements to a recently proposed Ambisonic decoding method, the matching projection method, for uneven loudspeaker layouts. The first improvement is energy preserving; the second is introducing the \"in-phase\" weight, and the third is introducing partial projection coefficients. To evaluate the improved method, we compared it with the original one and the all-round Ambisonic decoding method with a 2-dimension unevenly arranged loudspeaker array. The result shows our method greatly improves the original method where the loudspeaker arranges very sparsely or densely.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"20 1","pages":"121-125"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74784199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Zero-mean Convolutional Network with Data Augmentation for Sound Level Invariant Singing Voice Separation","authors":"Kin Wah Edward Lin, Masataka Goto","doi":"10.1109/ICASSP.2019.8682958","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682958","url":null,"abstract":"We address an issue of separating singing voices from polyphonic music signals regardless of sound level variance of the mixture input. Using a standard separation quality assessment tool BSS Eval 4.0, we found that the separation quality of a singing voice separation (SVS) system based on a dilatable Convolutional Neural Network (CNN) decreases under different sound levels. Even if this SVS system is comparable to state-of-the-art SVS systems, it is vulnerable to the issue of sound level variance. We therefore investigate four methods of making the CNN-based SVS system invariant to different sound levels — two types of data augmentation, frame normalization, and zero-mean convolution. By testing all 15 combinations of the four methods, we found that all combinations can improve the sound level invariance and analyzed the best combinations. To the best of our knowledge, this is the first SVS work systematically investigating sound level variance.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"251-255"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79169324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucas N. Ribeiro, A. Almeida, Nitin Jonathan Myers, R. Heath
{"title":"Tensor-based Estimation of mmWave MIMO Channels with Carrier Frequency Offset","authors":"Lucas N. Ribeiro, A. Almeida, Nitin Jonathan Myers, R. Heath","doi":"10.1109/ICASSP.2019.8683496","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683496","url":null,"abstract":"Millimeter wave multiple-input-multiple-output (MIMO) achieves the best performance when reliable channel state information is used to design the beams. Most channel estimation methods proposed in the literature, however, ignore practical hardware impairments such as carrier frequency offset (CFO) and may fail under such impairment. In this paper, we present a joint CFO and channel estimation method based on tensor modeling and compressed sensing. Simulation results indicate that the proposed method yields better channel recovery performance than the benchmark and that it is more robust to a small number of channel measurements.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"4155-4159"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81519370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Auralization of Omnidirectional Room Impulse Responses Based on the Spatial Decomposition Method and Synthetic Spatial Data","authors":"J. Ahrens","doi":"10.1109/ICASSP.2019.8683661","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683661","url":null,"abstract":"The spatial decomposition method decomposes acoustic room impulse responses into a pressure signal and a direction of arrival for each time instant of the pressure signal. An acoustic space can be auralized by distributing the pressure signal over the available loudspeakers or head-related transfer functions so that the required instantaneous propagation direction is recreated. We present a user study that demonstrates based on binaural auralization that the arrival directions can be synthesized from random data such that the auralization is nearly indistinguishable from the auralization of the original data. The presented concept constitutes the fundament of a highly scalable spatialization method for omnidirectional room impulse responses.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"146-150"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85608365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Factors Affecting Enf Based Time-of-recording Estimation for Video","authors":"Saffet Vatansever, A. Dirik, N. Memon","doi":"10.1109/ICASSP.2019.8682419","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682419","url":null,"abstract":"ENF (Electric Network Frequency) oscillates around a nominal value (50/60 Hz) due to imbalance between consumed and generated power. The intensity of a light source powered by mains electricity varies depending on the ENF fluctuations. These fluctuations can be extracted from videos recorded in the presence of mains-powered source illumination. This work investigates how the quality of the ENF signal estimated from video is affected by different light source illumination, compression ratios, and by social media encoding. Also explored is the effect of the length of the ENF ground-truth database on time of recording detection and verification.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"117 1","pages":"2497-2501"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77040920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Time-frequency Based Multivariate Phase-amplitude Coupling Measure","authors":"T. T. Munia, Selin Aviyente","doi":"10.1109/ICASSP.2019.8682966","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682966","url":null,"abstract":"Interaction of neuronal oscillations across different frequency bands plays an important role in perception, attention, and memory. One particular form of interaction is the modulation of the amplitude of high-frequency oscillations by the phase of low-frequency oscillations, known as phase-amplitude coupling (PAC). Current methods for quantifying PAC mostly rely on Hilbert transform which assumes that brain activity is stationary and narrowband. Moreover, these methods are limited to quantifying bivariate PAC and cannot capture multivariate cross-frequency coupling between different brain regions. This paper presents a new complex time-frequency based high resolution PAC measure and its extension to the multivariate case using PARAFAC (Parallel Factor) model. The proposed approach is evaluated on both simulated and real electroencephalogram (EEG) data.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"48 1","pages":"1095-1099"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80857820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combining Linear Spatial Filtering and Non-linear Parametric Processing for High-quality Spatial Sound Capturing","authors":"O. Thiergart, G. Milano, Emanuël Habets","doi":"10.1109/ICASSP.2019.8683515","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683515","url":null,"abstract":"Flexible spatial sound capturing and reproduction can be achieved with multiple microphones by using linear spatial filtering or non-linear parametric processing. The non-linear approaches usually provide a superior spatial resolution compared to the linear approaches but can result in artifacts due to violations of the sound field model. In this paper, we combine both approaches to achieve a high robustness against model violations and a high spatial resolution. We assume linear spatial filters that approximate the spatial responses of the desired output format and compensate remaining deviations with an optimal post filter. The post filter is computed such that the proposed approach behaves like a linear system when the spatial filters achieve the desired spatial response, and scales towards a non-linear system otherwise. Experimental results show that the proposed approach can significantly reduce distortions of existing parametric processing schemes especially when a sufficiently high number of microphones is available.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"14 1","pages":"571-575"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78566629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}