ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献_第9页

Dilated Residual Network with Multi-head Self-attention for Speech Emotion Recognition 基于多头自注意的扩展残差网络语音情绪识别

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8682154

Runnan Li, Zhiyong Wu, Jia Jia, Sheng Zhao, H. Meng

{"title":"Dilated Residual Network with Multi-head Self-attention for Speech Emotion Recognition","authors":"Runnan Li, Zhiyong Wu, Jia Jia, Sheng Zhao, H. Meng","doi":"10.1109/ICASSP.2019.8682154","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682154","url":null,"abstract":"Speech emotion recognition (SER) plays an important role in intelligent speech interaction. One vital challenge in SER is to extract emotion-relevant features from speech signals. In state-of-the-art SER techniques, deep learning methods, e.g, Convolutional Neural Networks (CNNs), are widely employed for feature learning and have achieved significant performance. However, in the CNN-oriented methods, two performance limitations have raised: 1) the loss of temporal structure of speech in the progressive resolution reduction; 2) the ignoring of relative dependencies between elements in suprasegmental feature sequence. In this paper, we proposed the combining use of Dilated Residual Network (DRN) and Multi-head Self-attention to alleviate the above limitations. By employing DRN, the network can retain high resolution of temporal structure in feature learning, with similar size of receptive field to CNN based approach. By employing Multi-head Self-attention, the network can model the inner dependencies between elements with different positions in the learned suprasegmental feature sequence, which enhances the importing of emotion-salient information. Experiments on emotional benchmarking dataset IEMOCAP have demonstrated the effectiveness of the proposed framework, with 11.7% to 18.6% relative improvement to state-of-the-art approaches.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"80 1 1","pages":"6675-6679"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89560647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 45

Baseline Wander Removal and Isoelectric Correction in Electrocardiograms Using Clustering 基于聚类的心电图基线漂移去除和等电校正

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8683084

Kjell Le, T. Eftestøl, K. Engan, Ø. Kleiven, S. Ørn

引用次数: 1

Deep Learning for Super-resolution Vascular Ultrasound Imaging 超分辨率血管超声成像的深度学习

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8683813

R. V. Sloun, Oren Solomon, M. Bruce, Zin Z. Khaing, Yonina C. Eldar, M. Mischi

{"title":"Deep Learning for Super-resolution Vascular Ultrasound Imaging","authors":"R. V. Sloun, Oren Solomon, M. Bruce, Zin Z. Khaing, Yonina C. Eldar, M. Mischi","doi":"10.1109/ICASSP.2019.8683813","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683813","url":null,"abstract":"Based on the intravascular infusion of gas microbubbles, which act as ultrasound contrast agents, ultrasound localization microscopy has enabled super resolution vascular imaging through precise detection of individual microbubbles across numerous imaging frames. However, analysis of high-density regions with significant overlaps among the microbubble point spread functions typically yields high localization errors, constraining the technique to low-concentration conditions. As such, long acquisition times are required for sufficient coverage of the vascular bed. Algorithms based on sparse recovery have been developed specifically to cope with the overlapping point-spread-functions of multiple microbubbles. While successful localization of densely-spaced emitters has been demonstrated, even highly optimized fast sparse recovery techniques involve a time-consuming iterative procedure. In this work, we used deep learning to improve upon standard ultrasound localization microscopy (Deep-ULM), and obtain super-resolution vascular images from high-density contrast-enhanced ultrasound data. Deep-ULM is suitable for real-time applications, resolving about 1250 high-resolution patches (128×128 pixels) per second using GPU acceleration.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"81 1","pages":"1055-1059"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90401242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 52

Towards End-to-end Speech-to-text Translation with Two-pass Decoding 基于双通道解码的端到端语音到文本翻译

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8682801

Tzu-Wei Sung, Jun-You Liu, Hung-yi Lee, Lin-Shan Lee

{"title":"Towards End-to-end Speech-to-text Translation with Two-pass Decoding","authors":"Tzu-Wei Sung, Jun-You Liu, Hung-yi Lee, Lin-Shan Lee","doi":"10.1109/ICASSP.2019.8682801","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682801","url":null,"abstract":"Speech-to-text translation (ST) refers to transforming the audio in source language to the text in target language. Mainstream solutions for such tasks are to cascade automatic speech recognition with machine translation, for which the transcriptions of the source language are needed in training. End-to-end approaches for ST tasks have been investigated because of not only technical interests such as to achieve globally optimized solution, but the need for ST tasks for the many source languages worldwide which do not have written form. In this paper, we propose a new end-to-end ST framework with two decoders to handle the relatively deeper relationships between the source language audio and target language text. The first-pass decoder generates some useful latent representations, and the second-pass decoder then integrates the output of both the encoder and the first-pass decoder to generate the text translation in target language. Only paired source language audio and target language text are used in training. Preliminary experiments on several language pairs showed improved performance, and offered some initial analysis.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"39 1","pages":"7175-7179"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88029031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Network Adaptation Strategies for Learning New Classes without Forgetting the Original Ones 学习新课程不忘原课程的网络适应策略

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8682848

Hagai Taitelbaum, Gal Chechik, J. Goldberger

引用次数: 2

Introducing the Orthogonal Periodic Sequences for the Identification of Functional Link Polynomial Filters 引入正交周期序列用于函数链多项式滤波器的辨识

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8683342

A. Carini, S. Orcioni, S. Cecchi

引用次数: 3

Performance Analysis of Convex Data Detection in MIMO MIMO中凸数据检测性能分析

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8683890

Ehsan Abbasi, Fariborz Salehi, B. Hassibi

{"title":"Performance Analysis of Convex Data Detection in MIMO","authors":"Ehsan Abbasi, Fariborz Salehi, B. Hassibi","doi":"10.1109/ICASSP.2019.8683890","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683890","url":null,"abstract":"We study the performance of a convex data detection method in large multiple-input multiple-output (MIMO) systems. The goal is to recover an n-dimensional complex signal whose entries are from an arbitrary constellation $mathcal{D} subset mathbb{C}$, using m noisy linear measurements. Since the Maximum Likelihood (ML) estimation involves minimizing a loss function over the discrete set ${mathcal{D}^n}$, it becomes computationally intractable for large n. One approach is to relax to a $mathcal{D}$ convex set and to utilize convex programing to solve the problem precise and then to map the answer to the closest point in the set $mathcal{D}$. We assume an i.i.d. complex Gaussian channel matrix and derive expressions for the symbol error probability of the proposed convex method in the limit of m, n → ∞. Prior work was only able to do so for real valued constellations such as BPSK and PAM. The main contribution of this paper is to extend the results to complex valued constellations. In particular, we use our main theorem to calculate the performance of the complex algorithm for PSK and QAM constellations. In addition, we introduce a closed-form formula for the symbol error probability in the high-SNR regime and determine the minimum number of measurements m required for consistent signal recovery.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"66 1","pages":"4554-4558"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90291360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Automatic Transcription of Diatonic Harmonica Recordings 自动转录的全音阶口琴录音

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8682334

Filipe M. Lins, M. Johann, Emmanouil Benetos, Rodrigo Schramm

{"title":"Automatic Transcription of Diatonic Harmonica Recordings","authors":"Filipe M. Lins, M. Johann, Emmanouil Benetos, Rodrigo Schramm","doi":"10.1109/ICASSP.2019.8682334","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682334","url":null,"abstract":"This paper presents a method for automatic transcription of the diatonic Harmonica instrument. It estimates the multi-pitch activations through a spectrogram factorisation framework. This framework is based on Probabilistic Latent Component Analysis (PLCA) and uses a fixed 4-dimensional dictionary with spectral templates extracted from Harmonica’s instrument timbre. Methods based on spectrogram factorisation may suffer from local-optima issues in the presence of harmonic overlap or considerable timbre variability. To alleviate this issue, we propose a set of harmonic constraints that are inherent to the Harmonica instrument note layout or are caused by specific diatonic Harmonica playing techniques. These constraints help to guide the factorisation process until convergence into meaningful multi-pitch activations is achieved. This work also builds a new audio dataset containing solo recordings of diatonic Harmonica excerpts and the respective multi-pitch annotations. We compare our proposed approach against multiple baseline techniques for automatic music transcription on this dataset and report the results based on frame-based F-measure statistics.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"29 1","pages":"256-260"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90299941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Imitation Refinement for X-ray Diffraction Signal Processing x射线衍射信号处理的模拟改进

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8683723

Junwen Bai, Zihang Lai, Runzhe Yang, Yexiang Xue, J. Gregoire, C. Gomes

{"title":"Imitation Refinement for X-ray Diffraction Signal Processing","authors":"Junwen Bai, Zihang Lai, Runzhe Yang, Yexiang Xue, J. Gregoire, C. Gomes","doi":"10.1109/ICASSP.2019.8683723","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683723","url":null,"abstract":"Many real-world tasks involve identifying signals from data satisfying background or prior knowledge. In domains like materials discovery, due to the flaws and biases in raw experimental data, the identification of X-ray diffraction (XRD) signals often requires significant (manual) expert work to find refined signals that are similar to the ideal theoretical ones. Automatically refining the raw XRD signals utilizing simulated theoretical data is thus desirable. We propose imitation refinement, a novel approach to refine imperfect input signals, guided by a pre-trained classifier incorporating prior knowledge from simulated theoretical data, such that the refined signals imitate the ideal ones. The classifier is trained on the ideal simulated data to classify signals and learns an embedding space where each class is represented by a prototype. The refiner learns to refine the imperfect signals with small modifications, such that their embeddings are closer to the corresponding prototypes. We show that the refiner can be trained in both supervised and unsupervised fashions. We further illustrate the effectiveness of the proposed approach both qualitatively and quantitatively in an X-ray diffraction signal refinement task in materials discovery.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"46 1","pages":"3337-3341"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90311979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Variational Adaptive Population Importance Sampler 变分适应种群重要性采样器

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8683152

Yousef El-Laham, P. Djurić, M. Bugallo

引用次数: 6