ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

筛选
英文 中文
Vocal Tract Articulatory Contour Detection in Real-Time Magnetic Resonance Images Using Spatio-Temporal Context 基于时空背景的实时磁共振图像声道发音轮廓检测
Ashwin Hebbar, Rahul Sharma, Krishna Somandepalli, Asterios Toutios, Shrikanth S. Narayanan
{"title":"Vocal Tract Articulatory Contour Detection in Real-Time Magnetic Resonance Images Using Spatio-Temporal Context","authors":"Ashwin Hebbar, Rahul Sharma, Krishna Somandepalli, Asterios Toutios, Shrikanth S. Narayanan","doi":"10.1109/ICASSP40776.2020.9053111","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053111","url":null,"abstract":"Due to its ability to visualize and measure the dynamics of vocal tract shaping during speech production, real-time magnetic resonance imaging (rtMRI) has emerged as one of the prominent research tools. The ability to track different articulators such as the tongue, lips, velum, and the pharynx is a crucial step toward automating further scientific and clinical analysis. Recently, various researchers have addressed the problem of detecting articulatory boundaries, but those are primarily limited to static-image based methods. In this work, we propose to use information from temporal dynamics together with the spatial structure to detect the articulatory boundaries in rtMRI videos. We train a convolutional LSTM network to detect and label the articulatory contours. We compare the produced contours against reference labels generated by iteratively fitting a manually created subject-specific template. We observe that the proposed method outperforms solely image-based methods, especially for the difficult-to-track articulators involved in airway constriction formation during speech.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"104 1","pages":"7354-7358"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73471078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Interpretability-Guided Convolutional Neural Networks for Seismic Fault Segmentation 基于可解释性的卷积神经网络地震断层分割
Zhining Liu, Cheng Zhou, Guangmin Hu, Chengyun Song
{"title":"Interpretability-Guided Convolutional Neural Networks for Seismic Fault Segmentation","authors":"Zhining Liu, Cheng Zhou, Guangmin Hu, Chengyun Song","doi":"10.1109/ICASSP40776.2020.9053472","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053472","url":null,"abstract":"Delineating the seismic fault, which is an important type of geologic structures in seismic images, is a key step for seismic interpretation. Comparing with conventional methods that design a number of hand-crafted features based on the observed characteristics of the seismic fault, convolutional neural networks (CNNs) have proven to be more powerful for automatically learning effective representations. However, the CNN usually serves as a black box in the process of training and inference, which would lead to trust issues. The inability of humans to understand the CNN would be more problematic, especially in critical areas like seismic exploration, medicine and financial markets. To include domain knowledge to improve the interpretability of the CNN, we propose to jointly optimize the prediction accuracy and consistency between explanations of the neural network and domain knowledge. Taking the seismic fault segmentation as an example, we show that the proposed method not only gives reasonable explanations for its predictions, but also more accurately predicts faults than the baseline model.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"22 1","pages":"4312-4316"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73474098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Improving Proper Noun Recognition in End-To-End Asr by Customization of the Mwer Loss Criterion 自定义词频损耗准则改进端到端自动识别中的专有名词识别
Cal Peyser, Tara N. Sainath, G. Pundak
{"title":"Improving Proper Noun Recognition in End-To-End Asr by Customization of the Mwer Loss Criterion","authors":"Cal Peyser, Tara N. Sainath, G. Pundak","doi":"10.1109/ICASSP40776.2020.9054235","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054235","url":null,"abstract":"Proper nouns present a challenge for end-to-end (E2E) automatic speech recognition (ASR) systems in that a particular name may appear only rarely during training, and may have a pronunciation similar to that of a more common word. Unlike conventional ASR models, E2E systems lack an explicit pronounciation model that can be specifically trained with proper noun pronounciations and a language model that can be trained on a large text-only corpus. Past work has addressed this issue by incorporating additional training data or additional models. In this paper, we instead build on recent advances in minimum word error rate (MWER) training to develop two new loss criteria that specifically emphasize proper noun recognition. Unlike past work on this problem, this method requires no new data during training or external models during inference. We see improvements ranging from 2% to 7% relative on several relevant benchmarks.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"12 1","pages":"7789-7793"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74076425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Coded Illumination and Multiplexing for Lensless Imaging 无透镜成像的编码照明和多路复用
Yucheng Zheng, Rongjia Zhang, M. Salman Asif
{"title":"Coded Illumination and Multiplexing for Lensless Imaging","authors":"Yucheng Zheng, Rongjia Zhang, M. Salman Asif","doi":"10.1109/ICASSP40776.2020.9052955","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9052955","url":null,"abstract":"Mask-based lensless cameras offer an alternative option to conventional cameras. Compared to conventional cameras, lensless cameras can be extremely thin, flexible, and lightweight. Despite these advantages, the quality of images recovered from the lensless cameras is often poor because of the ill-conditioning of the underlying linear system. In this paper, we propose a new method to address the problem of illconditioning by combining coded illumination patterns with the mask-based lensless imaging. We assume that the object is illuminated with multiple binary patterns and the camera acquires a sequence of images for different illumination patterns. We propose a low-complexity, recursive algorithm that avoids storing all the images or creating a large system matrix. We present simulation results on standard test images under various extreme conditions and demonstrate that the quality of the image improves significantly with a small number of illumination patterns.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"74 1","pages":"9250-9253"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75278631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Exploiting Channel Locality for Adaptive Massive MIMO Signal Detection 利用信道局部性实现自适应海量MIMO信号检测
Mehrdad Khani Shirkoohi, Mohammad Alizadeh, J. Hoydis, Phil Fleming
{"title":"Exploiting Channel Locality for Adaptive Massive MIMO Signal Detection","authors":"Mehrdad Khani Shirkoohi, Mohammad Alizadeh, J. Hoydis, Phil Fleming","doi":"10.1109/ICASSP40776.2020.9052971","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9052971","url":null,"abstract":"We propose MMNet, a deep learning MIMO detection scheme that significantly outperforms existing approaches on realistic channels with the same or lower computational complexity. MMNet’s design builds on the theory of iterative soft-thresholding algorithms and uses a novel training algorithm that leverages temporal and spectral correlation in real channels to accelerate training. These innovations make it practical to train MMNet online for every realization of the channel. On spatially-correlated channels, MMNet achieves the same error rate as the next-best learning scheme (OAMPNet) at 2.5dB lower signal-to-noise ratio (SNR), and with at least 10× less computational complexity. MMNet is also 4–8dB better overall than the linear minimum mean square error (MMSE) detector.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"40 1","pages":"8565-8568"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75642564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hybrid Active Contour Driven by Double-Weighted Signed Pressure Force for Image Segmentation 双加权签名压力驱动的混合主动轮廓图像分割
Xingyu Fu, Bin Fang, Mingliang Zhou, Jiajun Li
{"title":"Hybrid Active Contour Driven by Double-Weighted Signed Pressure Force for Image Segmentation","authors":"Xingyu Fu, Bin Fang, Mingliang Zhou, Jiajun Li","doi":"10.1109/ICASSP40776.2020.9054627","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054627","url":null,"abstract":"In this paper, we proposed a novel hybrid active contour driven by double-weighted signed pressure force method for image segmentation. First, the Legendre polynomials and global information are integrated into the signed pressure force (SPF) function and a coefficient is applied to weight the effect degrees of the Legendre term and global term. Second, by introducing a weighted factor as the coefficient of inside and outside region fitting center, the curve can be optimally evolved to the interior and branches of the region of interest (ROI). Third, a new edge stopping function is adopted to robustly capture the edge of ROI and speed up the multi-object image segmentation. Experiments show that the proposed method can achieve better accuracy for images with noise, inhomogeneous intensity, blur edge and complex branches, in the meanwhile, it also controls the time-consuming effectively and is insensitive to the initial contour position.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"4 1","pages":"2463-2467"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75749810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Enhanced Action Tubelet Detector for Spatio-Temporal Video Action Detection 用于时空视频动作检测的增强型动作小管检测器
Yutang Wu, Hanli Wang, Shuheng Wang, Qinyu Li
{"title":"Enhanced Action Tubelet Detector for Spatio-Temporal Video Action Detection","authors":"Yutang Wu, Hanli Wang, Shuheng Wang, Qinyu Li","doi":"10.1109/ICASSP40776.2020.9054394","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054394","url":null,"abstract":"Current spatio-temporal action detection methods usually employ a two-stream architecture, a RGB stream for raw images and an auxiliary motion stream for optical flow. Training is required individually for each stream and more efforts are necessary to improve the precision of RGB stream. To this end, a single stream network named enhanced action tubelet (EAT) detector is proposed in this work based on RGB stream. A modulation layer is designed to modulate RGB features with conditional information from the visual clues of optical flow and human pose. This network is end-to-end and the proposed layer can be easily applied into other action detectors. Experiments show that EAT detector outperforms traditional RGB stream and is competitive to existing two-stream methods while free from the trouble of training streams separately. By being embedded in a new three-stream architecture, the resulting three-stream EAT detector achieves impressive performances among the best competitors on UCF-Sports, JHMDB and UCF-101.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"240 1","pages":"2388-2392"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74682503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Rate-Invariant Autoencoding of Time-Series 时间序列的率不变自编码
K. Koneripalli, Suhas Lohit, Rushil Anirudh, P. Turaga
{"title":"Rate-Invariant Autoencoding of Time-Series","authors":"K. Koneripalli, Suhas Lohit, Rushil Anirudh, P. Turaga","doi":"10.1109/ICASSP40776.2020.9053983","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053983","url":null,"abstract":"For time-series classification and retrieval applications, an important requirement is to develop representations/metrics that are robust to re-parametrization of the time-axis. Temporal re-parametrization as a model can account for variability in the underlying generative process, sampling rate variations, or plain temporal mis-alignment. In this paper, we extend prior work in disentangling latent spaces of autoencoding models, to design a novel architecture to learn rate-invariant latent codes in a completely unsupervised fashion. Unlike conventional neural network architectures, this method allows to explicitly disentangle temporal parameters in the form of order-preserving diffeomorphisms with respect to a learnable template. This makes the latent space more easily interpretable. We show the efficacy of our approach on a synthetic dataset and a real dataset for hand action-recognition.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"7 1","pages":"3732-3736"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73095209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
SED-MDD: Towards Sentence Dependent End-To-End Mispronunciation Detection and Diagnosis 基于句子的端到端发音错误检测与诊断
Yiqing Feng, Guanyu Fu, Qingcai Chen, Kai Chen
{"title":"SED-MDD: Towards Sentence Dependent End-To-End Mispronunciation Detection and Diagnosis","authors":"Yiqing Feng, Guanyu Fu, Qingcai Chen, Kai Chen","doi":"10.1109/ICASSP40776.2020.9052975","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9052975","url":null,"abstract":"A mispronunciation detection and diagnosis (MD&D) system typically consists of multiple stages, such as an acoustic model, a language model and a Viterbi decoder. In order to integrate these stages, we propose SED-MDD, an end-to-end model for sentence dependent mispronunciation detection and diagnosis (MD&D) . Our proposed model takes mel-spectrogram and characters as inputs and outputs the corresponding phone sequence. Our experiments prove that SED-MDD can implicitly learn the phonological rules in both acoustic and linguistic features directly from the phonological annotation and transcription in the training data. To the best of our knowledge, SED-MDD is the first model of its kind and it achieves an accuracy of 86.35% and a correctness of 88.61% on L2-ARCTIC which significantly outperforms the existing end-to-end mispronunciation detection and diagnosis (MD&D) model CNN-RNN-CTC.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"77 1","pages":"3492-3496"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74199794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Theoretical Analysis of Multi-Carrier Agile Phased Array Radar 多载波敏捷相控阵雷达的理论分析
Tianyao Huang, Nir Shlezinger, Xingyu Xu, Dingyou Ma, Yimin Liu, Yonina C. Eldar
{"title":"Theoretical Analysis of Multi-Carrier Agile Phased Array Radar","authors":"Tianyao Huang, Nir Shlezinger, Xingyu Xu, Dingyou Ma, Yimin Liu, Yonina C. Eldar","doi":"10.1109/ICASSP40776.2020.9054035","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054035","url":null,"abstract":"Modern radar systems are expected to operate reliably in congested environments under cost and power constraints. A recent technology for realizing such systems is frequency agile radar (FAR), which transmits narrowband pulses in a frequency hopping manner. To enhance the target recovery performance of FAR in complex electromagnetic environments, and particularly, its range-Doppler recovery performance, multi-Carrier AgilE phaSed Array Radar (CAESAR) was proposed. CAESAR extends FAR to multi-carrier waveforms while introducing the notion of spatial agility. In this paper, we theoretically analyze the range-Doppler recovery capabilities of CAESAR. Particularly, we derive conditions which guarantee accurate reconstruction of these range-Doppler parameters. These conditions indicate that by increasing the number of frequencies transmitted in each pulse, CAESAR improves performance over conventional FAR, especially in complex environments where some radar measurements are severely corrupted by interference.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"39 4","pages":"4702-4706"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72573206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信