{"title":"Vocal Tract Articulatory Contour Detection in Real-Time Magnetic Resonance Images Using Spatio-Temporal Context","authors":"Ashwin Hebbar, Rahul Sharma, Krishna Somandepalli, Asterios Toutios, Shrikanth S. Narayanan","doi":"10.1109/ICASSP40776.2020.9053111","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053111","url":null,"abstract":"Due to its ability to visualize and measure the dynamics of vocal tract shaping during speech production, real-time magnetic resonance imaging (rtMRI) has emerged as one of the prominent research tools. The ability to track different articulators such as the tongue, lips, velum, and the pharynx is a crucial step toward automating further scientific and clinical analysis. Recently, various researchers have addressed the problem of detecting articulatory boundaries, but those are primarily limited to static-image based methods. In this work, we propose to use information from temporal dynamics together with the spatial structure to detect the articulatory boundaries in rtMRI videos. We train a convolutional LSTM network to detect and label the articulatory contours. We compare the produced contours against reference labels generated by iteratively fitting a manually created subject-specific template. We observe that the proposed method outperforms solely image-based methods, especially for the difficult-to-track articulators involved in airway constriction formation during speech.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"104 1","pages":"7354-7358"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73471078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhining Liu, Cheng Zhou, Guangmin Hu, Chengyun Song
{"title":"Interpretability-Guided Convolutional Neural Networks for Seismic Fault Segmentation","authors":"Zhining Liu, Cheng Zhou, Guangmin Hu, Chengyun Song","doi":"10.1109/ICASSP40776.2020.9053472","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053472","url":null,"abstract":"Delineating the seismic fault, which is an important type of geologic structures in seismic images, is a key step for seismic interpretation. Comparing with conventional methods that design a number of hand-crafted features based on the observed characteristics of the seismic fault, convolutional neural networks (CNNs) have proven to be more powerful for automatically learning effective representations. However, the CNN usually serves as a black box in the process of training and inference, which would lead to trust issues. The inability of humans to understand the CNN would be more problematic, especially in critical areas like seismic exploration, medicine and financial markets. To include domain knowledge to improve the interpretability of the CNN, we propose to jointly optimize the prediction accuracy and consistency between explanations of the neural network and domain knowledge. Taking the seismic fault segmentation as an example, we show that the proposed method not only gives reasonable explanations for its predictions, but also more accurately predicts faults than the baseline model.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"22 1","pages":"4312-4316"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73474098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Proper Noun Recognition in End-To-End Asr by Customization of the Mwer Loss Criterion","authors":"Cal Peyser, Tara N. Sainath, G. Pundak","doi":"10.1109/ICASSP40776.2020.9054235","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054235","url":null,"abstract":"Proper nouns present a challenge for end-to-end (E2E) automatic speech recognition (ASR) systems in that a particular name may appear only rarely during training, and may have a pronunciation similar to that of a more common word. Unlike conventional ASR models, E2E systems lack an explicit pronounciation model that can be specifically trained with proper noun pronounciations and a language model that can be trained on a large text-only corpus. Past work has addressed this issue by incorporating additional training data or additional models. In this paper, we instead build on recent advances in minimum word error rate (MWER) training to develop two new loss criteria that specifically emphasize proper noun recognition. Unlike past work on this problem, this method requires no new data during training or external models during inference. We see improvements ranging from 2% to 7% relative on several relevant benchmarks.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"12 1","pages":"7789-7793"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74076425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Coded Illumination and Multiplexing for Lensless Imaging","authors":"Yucheng Zheng, Rongjia Zhang, M. Salman Asif","doi":"10.1109/ICASSP40776.2020.9052955","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9052955","url":null,"abstract":"Mask-based lensless cameras offer an alternative option to conventional cameras. Compared to conventional cameras, lensless cameras can be extremely thin, flexible, and lightweight. Despite these advantages, the quality of images recovered from the lensless cameras is often poor because of the ill-conditioning of the underlying linear system. In this paper, we propose a new method to address the problem of illconditioning by combining coded illumination patterns with the mask-based lensless imaging. We assume that the object is illuminated with multiple binary patterns and the camera acquires a sequence of images for different illumination patterns. We propose a low-complexity, recursive algorithm that avoids storing all the images or creating a large system matrix. We present simulation results on standard test images under various extreme conditions and demonstrate that the quality of the image improves significantly with a small number of illumination patterns.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"74 1","pages":"9250-9253"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75278631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mehrdad Khani Shirkoohi, Mohammad Alizadeh, J. Hoydis, Phil Fleming
{"title":"Exploiting Channel Locality for Adaptive Massive MIMO Signal Detection","authors":"Mehrdad Khani Shirkoohi, Mohammad Alizadeh, J. Hoydis, Phil Fleming","doi":"10.1109/ICASSP40776.2020.9052971","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9052971","url":null,"abstract":"We propose MMNet, a deep learning MIMO detection scheme that significantly outperforms existing approaches on realistic channels with the same or lower computational complexity. MMNet’s design builds on the theory of iterative soft-thresholding algorithms and uses a novel training algorithm that leverages temporal and spectral correlation in real channels to accelerate training. These innovations make it practical to train MMNet online for every realization of the channel. On spatially-correlated channels, MMNet achieves the same error rate as the next-best learning scheme (OAMPNet) at 2.5dB lower signal-to-noise ratio (SNR), and with at least 10× less computational complexity. MMNet is also 4–8dB better overall than the linear minimum mean square error (MMSE) detector.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"40 1","pages":"8565-8568"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75642564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid Active Contour Driven by Double-Weighted Signed Pressure Force for Image Segmentation","authors":"Xingyu Fu, Bin Fang, Mingliang Zhou, Jiajun Li","doi":"10.1109/ICASSP40776.2020.9054627","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054627","url":null,"abstract":"In this paper, we proposed a novel hybrid active contour driven by double-weighted signed pressure force method for image segmentation. First, the Legendre polynomials and global information are integrated into the signed pressure force (SPF) function and a coefficient is applied to weight the effect degrees of the Legendre term and global term. Second, by introducing a weighted factor as the coefficient of inside and outside region fitting center, the curve can be optimally evolved to the interior and branches of the region of interest (ROI). Third, a new edge stopping function is adopted to robustly capture the edge of ROI and speed up the multi-object image segmentation. Experiments show that the proposed method can achieve better accuracy for images with noise, inhomogeneous intensity, blur edge and complex branches, in the meanwhile, it also controls the time-consuming effectively and is insensitive to the initial contour position.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"4 1","pages":"2463-2467"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75749810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhanced Action Tubelet Detector for Spatio-Temporal Video Action Detection","authors":"Yutang Wu, Hanli Wang, Shuheng Wang, Qinyu Li","doi":"10.1109/ICASSP40776.2020.9054394","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054394","url":null,"abstract":"Current spatio-temporal action detection methods usually employ a two-stream architecture, a RGB stream for raw images and an auxiliary motion stream for optical flow. Training is required individually for each stream and more efforts are necessary to improve the precision of RGB stream. To this end, a single stream network named enhanced action tubelet (EAT) detector is proposed in this work based on RGB stream. A modulation layer is designed to modulate RGB features with conditional information from the visual clues of optical flow and human pose. This network is end-to-end and the proposed layer can be easily applied into other action detectors. Experiments show that EAT detector outperforms traditional RGB stream and is competitive to existing two-stream methods while free from the trouble of training streams separately. By being embedded in a new three-stream architecture, the resulting three-stream EAT detector achieves impressive performances among the best competitors on UCF-Sports, JHMDB and UCF-101.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"240 1","pages":"2388-2392"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74682503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Koneripalli, Suhas Lohit, Rushil Anirudh, P. Turaga
{"title":"Rate-Invariant Autoencoding of Time-Series","authors":"K. Koneripalli, Suhas Lohit, Rushil Anirudh, P. Turaga","doi":"10.1109/ICASSP40776.2020.9053983","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053983","url":null,"abstract":"For time-series classification and retrieval applications, an important requirement is to develop representations/metrics that are robust to re-parametrization of the time-axis. Temporal re-parametrization as a model can account for variability in the underlying generative process, sampling rate variations, or plain temporal mis-alignment. In this paper, we extend prior work in disentangling latent spaces of autoencoding models, to design a novel architecture to learn rate-invariant latent codes in a completely unsupervised fashion. Unlike conventional neural network architectures, this method allows to explicitly disentangle temporal parameters in the form of order-preserving diffeomorphisms with respect to a learnable template. This makes the latent space more easily interpretable. We show the efficacy of our approach on a synthetic dataset and a real dataset for hand action-recognition.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"7 1","pages":"3732-3736"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73095209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SED-MDD: Towards Sentence Dependent End-To-End Mispronunciation Detection and Diagnosis","authors":"Yiqing Feng, Guanyu Fu, Qingcai Chen, Kai Chen","doi":"10.1109/ICASSP40776.2020.9052975","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9052975","url":null,"abstract":"A mispronunciation detection and diagnosis (MD&D) system typically consists of multiple stages, such as an acoustic model, a language model and a Viterbi decoder. In order to integrate these stages, we propose SED-MDD, an end-to-end model for sentence dependent mispronunciation detection and diagnosis (MD&D) . Our proposed model takes mel-spectrogram and characters as inputs and outputs the corresponding phone sequence. Our experiments prove that SED-MDD can implicitly learn the phonological rules in both acoustic and linguistic features directly from the phonological annotation and transcription in the training data. To the best of our knowledge, SED-MDD is the first model of its kind and it achieves an accuracy of 86.35% and a correctness of 88.61% on L2-ARCTIC which significantly outperforms the existing end-to-end mispronunciation detection and diagnosis (MD&D) model CNN-RNN-CTC.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"77 1","pages":"3492-3496"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74199794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianyao Huang, Nir Shlezinger, Xingyu Xu, Dingyou Ma, Yimin Liu, Yonina C. Eldar
{"title":"Theoretical Analysis of Multi-Carrier Agile Phased Array Radar","authors":"Tianyao Huang, Nir Shlezinger, Xingyu Xu, Dingyou Ma, Yimin Liu, Yonina C. Eldar","doi":"10.1109/ICASSP40776.2020.9054035","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054035","url":null,"abstract":"Modern radar systems are expected to operate reliably in congested environments under cost and power constraints. A recent technology for realizing such systems is frequency agile radar (FAR), which transmits narrowband pulses in a frequency hopping manner. To enhance the target recovery performance of FAR in complex electromagnetic environments, and particularly, its range-Doppler recovery performance, multi-Carrier AgilE phaSed Array Radar (CAESAR) was proposed. CAESAR extends FAR to multi-carrier waveforms while introducing the notion of spatial agility. In this paper, we theoretically analyze the range-Doppler recovery capabilities of CAESAR. Particularly, we derive conditions which guarantee accurate reconstruction of these range-Doppler parameters. These conditions indicate that by increasing the number of frequencies transmitted in each pulse, CAESAR improves performance over conventional FAR, especially in complex environments where some radar measurements are severely corrupted by interference.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"39 4","pages":"4702-4706"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72573206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}