ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

筛选
英文 中文
Dilated Residual Network with Multi-head Self-attention for Speech Emotion Recognition 基于多头自注意的扩展残差网络语音情绪识别
Runnan Li, Zhiyong Wu, Jia Jia, Sheng Zhao, H. Meng
{"title":"Dilated Residual Network with Multi-head Self-attention for Speech Emotion Recognition","authors":"Runnan Li, Zhiyong Wu, Jia Jia, Sheng Zhao, H. Meng","doi":"10.1109/ICASSP.2019.8682154","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682154","url":null,"abstract":"Speech emotion recognition (SER) plays an important role in intelligent speech interaction. One vital challenge in SER is to extract emotion-relevant features from speech signals. In state-of-the-art SER techniques, deep learning methods, e.g, Convolutional Neural Networks (CNNs), are widely employed for feature learning and have achieved significant performance. However, in the CNN-oriented methods, two performance limitations have raised: 1) the loss of temporal structure of speech in the progressive resolution reduction; 2) the ignoring of relative dependencies between elements in suprasegmental feature sequence. In this paper, we proposed the combining use of Dilated Residual Network (DRN) and Multi-head Self-attention to alleviate the above limitations. By employing DRN, the network can retain high resolution of temporal structure in feature learning, with similar size of receptive field to CNN based approach. By employing Multi-head Self-attention, the network can model the inner dependencies between elements with different positions in the learned suprasegmental feature sequence, which enhances the importing of emotion-salient information. Experiments on emotional benchmarking dataset IEMOCAP have demonstrated the effectiveness of the proposed framework, with 11.7% to 18.6% relative improvement to state-of-the-art approaches.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"80 1 1","pages":"6675-6679"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89560647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
Baseline Wander Removal and Isoelectric Correction in Electrocardiograms Using Clustering 基于聚类的心电图基线漂移去除和等电校正
Kjell Le, T. Eftestøl, K. Engan, Ø. Kleiven, S. Ørn
{"title":"Baseline Wander Removal and Isoelectric Correction in Electrocardiograms Using Clustering","authors":"Kjell Le, T. Eftestøl, K. Engan, Ø. Kleiven, S. Ørn","doi":"10.1109/ICASSP.2019.8683084","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683084","url":null,"abstract":"Baseline wander is a low frequency noise which is often removed by a highpass filter in electrocardiogram signals. However, this might not be sufficient to correct the isoelectric level of the signal, there exist an isoelectric bias. The isoelectric level is used as a reference point for amplitude measurements, and is recommended to have this point at 0 V, i.e. isoelectric adjusted. To correct the isoelectric level a clustering method is proposed to determine the isoelectric bias, which is thereafter subtracted from a signal averaged template. Calculation of the mean electrical axis (MEA) is used to evaluate the iso-electric correction. The MEA can be estimated from any lead pairs in the frontal plane, and a low variance in the estimates over the different lead pairs would suggest that the calculation of the MEA in each lead pair are consistent. Different methods are evaluated for calculating MEA, and the variance in the results as well as other measures, favour the proposed isoelectric adjusted signals in all MEA methods.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"1274-1278"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90022306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deep Learning for Super-resolution Vascular Ultrasound Imaging 超分辨率血管超声成像的深度学习
R. V. Sloun, Oren Solomon, M. Bruce, Zin Z. Khaing, Yonina C. Eldar, M. Mischi
{"title":"Deep Learning for Super-resolution Vascular Ultrasound Imaging","authors":"R. V. Sloun, Oren Solomon, M. Bruce, Zin Z. Khaing, Yonina C. Eldar, M. Mischi","doi":"10.1109/ICASSP.2019.8683813","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683813","url":null,"abstract":"Based on the intravascular infusion of gas microbubbles, which act as ultrasound contrast agents, ultrasound localization microscopy has enabled super resolution vascular imaging through precise detection of individual microbubbles across numerous imaging frames. However, analysis of high-density regions with significant overlaps among the microbubble point spread functions typically yields high localization errors, constraining the technique to low-concentration conditions. As such, long acquisition times are required for sufficient coverage of the vascular bed. Algorithms based on sparse recovery have been developed specifically to cope with the overlapping point-spread-functions of multiple microbubbles. While successful localization of densely-spaced emitters has been demonstrated, even highly optimized fast sparse recovery techniques involve a time-consuming iterative procedure. In this work, we used deep learning to improve upon standard ultrasound localization microscopy (Deep-ULM), and obtain super-resolution vascular images from high-density contrast-enhanced ultrasound data. Deep-ULM is suitable for real-time applications, resolving about 1250 high-resolution patches (128×128 pixels) per second using GPU acceleration.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"81 1","pages":"1055-1059"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90401242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
Towards End-to-end Speech-to-text Translation with Two-pass Decoding 基于双通道解码的端到端语音到文本翻译
Tzu-Wei Sung, Jun-You Liu, Hung-yi Lee, Lin-Shan Lee
{"title":"Towards End-to-end Speech-to-text Translation with Two-pass Decoding","authors":"Tzu-Wei Sung, Jun-You Liu, Hung-yi Lee, Lin-Shan Lee","doi":"10.1109/ICASSP.2019.8682801","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682801","url":null,"abstract":"Speech-to-text translation (ST) refers to transforming the audio in source language to the text in target language. Mainstream solutions for such tasks are to cascade automatic speech recognition with machine translation, for which the transcriptions of the source language are needed in training. End-to-end approaches for ST tasks have been investigated because of not only technical interests such as to achieve globally optimized solution, but the need for ST tasks for the many source languages worldwide which do not have written form. In this paper, we propose a new end-to-end ST framework with two decoders to handle the relatively deeper relationships between the source language audio and target language text. The first-pass decoder generates some useful latent representations, and the second-pass decoder then integrates the output of both the encoder and the first-pass decoder to generate the text translation in target language. Only paired source language audio and target language text are used in training. Preliminary experiments on several language pairs showed improved performance, and offered some initial analysis.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"39 1","pages":"7175-7179"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88029031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Network Adaptation Strategies for Learning New Classes without Forgetting the Original Ones 学习新课程不忘原课程的网络适应策略
Hagai Taitelbaum, Gal Chechik, J. Goldberger
{"title":"Network Adaptation Strategies for Learning New Classes without Forgetting the Original Ones","authors":"Hagai Taitelbaum, Gal Chechik, J. Goldberger","doi":"10.1109/ICASSP.2019.8682848","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682848","url":null,"abstract":"We address the problem of adding new classes to an existing classifier without hurting the original classes, when no access is allowed to any sample from the original classes. This problem arises frequently since models are often shared without their training data, due to privacy and data ownership concerns. We propose an easy-to-use approach that modifies the original classifier by retraining a suitable subset of layers using a linearly-tuned, knowledge-distillation regularization. The set of layers that is tuned depends on the number of new added classes and the number of original classes.We evaluate the proposed method on two standard datasets, first in a language-identification task, then in an image classification setup. In both cases, the method achieves classification accuracy that is almost as good as that obtained by a system trained using unrestricted samples from both the original and new classes.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"78 1","pages":"3637-3641"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90247056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Introducing the Orthogonal Periodic Sequences for the Identification of Functional Link Polynomial Filters 引入正交周期序列用于函数链多项式滤波器的辨识
A. Carini, S. Orcioni, S. Cecchi
{"title":"Introducing the Orthogonal Periodic Sequences for the Identification of Functional Link Polynomial Filters","authors":"A. Carini, S. Orcioni, S. Cecchi","doi":"10.1109/ICASSP.2019.8683342","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683342","url":null,"abstract":"The paper introduces a novel family of deterministic signals, the orthogonal periodic sequences (OPSs), for the identification of functional link polynomial (FLiP) filters. The novel sequences share many of the characteristics of the perfect periodic sequences (PPSs). As the PPSs, they allow the perfect identification of a FLiP filter on a finite time interval with the cross-correlation method. In contrast to the PPSs, OPSs can identify also non-orthogonal FLiP filters, as the Volterra filters. With OPSs, the input sequence can have any persistently exciting distribution and can also be a quantized sequence. OPSs can often identify FLiP filters with a sequence period and a computational complexity much smaller than that of PPSs. Several results are reported to show the effectiveness of the proposed sequences identifying a real nonlinear audio system.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"65 1","pages":"5486-5490"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90268114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Performance Analysis of Convex Data Detection in MIMO MIMO中凸数据检测性能分析
Ehsan Abbasi, Fariborz Salehi, B. Hassibi
{"title":"Performance Analysis of Convex Data Detection in MIMO","authors":"Ehsan Abbasi, Fariborz Salehi, B. Hassibi","doi":"10.1109/ICASSP.2019.8683890","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683890","url":null,"abstract":"We study the performance of a convex data detection method in large multiple-input multiple-output (MIMO) systems. The goal is to recover an n-dimensional complex signal whose entries are from an arbitrary constellation $mathcal{D} subset mathbb{C}$, using m noisy linear measurements. Since the Maximum Likelihood (ML) estimation involves minimizing a loss function over the discrete set ${mathcal{D}^n}$, it becomes computationally intractable for large n. One approach is to relax to a $mathcal{D}$ convex set and to utilize convex programing to solve the problem precise and then to map the answer to the closest point in the set $mathcal{D}$. We assume an i.i.d. complex Gaussian channel matrix and derive expressions for the symbol error probability of the proposed convex method in the limit of m, n → ∞. Prior work was only able to do so for real valued constellations such as BPSK and PAM. The main contribution of this paper is to extend the results to complex valued constellations. In particular, we use our main theorem to calculate the performance of the complex algorithm for PSK and QAM constellations. In addition, we introduce a closed-form formula for the symbol error probability in the high-SNR regime and determine the minimum number of measurements m required for consistent signal recovery.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"66 1","pages":"4554-4558"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90291360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Automatic Transcription of Diatonic Harmonica Recordings 自动转录的全音阶口琴录音
Filipe M. Lins, M. Johann, Emmanouil Benetos, Rodrigo Schramm
{"title":"Automatic Transcription of Diatonic Harmonica Recordings","authors":"Filipe M. Lins, M. Johann, Emmanouil Benetos, Rodrigo Schramm","doi":"10.1109/ICASSP.2019.8682334","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682334","url":null,"abstract":"This paper presents a method for automatic transcription of the diatonic Harmonica instrument. It estimates the multi-pitch activations through a spectrogram factorisation framework. This framework is based on Probabilistic Latent Component Analysis (PLCA) and uses a fixed 4-dimensional dictionary with spectral templates extracted from Harmonica’s instrument timbre. Methods based on spectrogram factorisation may suffer from local-optima issues in the presence of harmonic overlap or considerable timbre variability. To alleviate this issue, we propose a set of harmonic constraints that are inherent to the Harmonica instrument note layout or are caused by specific diatonic Harmonica playing techniques. These constraints help to guide the factorisation process until convergence into meaningful multi-pitch activations is achieved. This work also builds a new audio dataset containing solo recordings of diatonic Harmonica excerpts and the respective multi-pitch annotations. We compare our proposed approach against multiple baseline techniques for automatic music transcription on this dataset and report the results based on frame-based F-measure statistics.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"29 1","pages":"256-260"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90299941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Imitation Refinement for X-ray Diffraction Signal Processing x射线衍射信号处理的模拟改进
Junwen Bai, Zihang Lai, Runzhe Yang, Yexiang Xue, J. Gregoire, C. Gomes
{"title":"Imitation Refinement for X-ray Diffraction Signal Processing","authors":"Junwen Bai, Zihang Lai, Runzhe Yang, Yexiang Xue, J. Gregoire, C. Gomes","doi":"10.1109/ICASSP.2019.8683723","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683723","url":null,"abstract":"Many real-world tasks involve identifying signals from data satisfying background or prior knowledge. In domains like materials discovery, due to the flaws and biases in raw experimental data, the identification of X-ray diffraction (XRD) signals often requires significant (manual) expert work to find refined signals that are similar to the ideal theoretical ones. Automatically refining the raw XRD signals utilizing simulated theoretical data is thus desirable. We propose imitation refinement, a novel approach to refine imperfect input signals, guided by a pre-trained classifier incorporating prior knowledge from simulated theoretical data, such that the refined signals imitate the ideal ones. The classifier is trained on the ideal simulated data to classify signals and learns an embedding space where each class is represented by a prototype. The refiner learns to refine the imperfect signals with small modifications, such that their embeddings are closer to the corresponding prototypes. We show that the refiner can be trained in both supervised and unsupervised fashions. We further illustrate the effectiveness of the proposed approach both qualitatively and quantitatively in an X-ray diffraction signal refinement task in materials discovery.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"46 1","pages":"3337-3341"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90311979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Variational Adaptive Population Importance Sampler 变分适应种群重要性采样器
Yousef El-Laham, P. Djurić, M. Bugallo
{"title":"A Variational Adaptive Population Importance Sampler","authors":"Yousef El-Laham, P. Djurić, M. Bugallo","doi":"10.1109/ICASSP.2019.8683152","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683152","url":null,"abstract":"Adaptive importance sampling (AIS) methods are a family of algorithms which can be used to approximate Bayesian posterior distributions. Many AIS algorithms exist in the literature, where the differences arise in the manner by which the proposal distribution is adapted at each iteration. The adaptive population importance sampler (APIS), for example, deterministically samples from a mixture distribution and uses the local information given by the samples and weights to adapt the location parameter of each proposal. The update rules by nature are heuristic, but effective, especially in the case that the target posterior is multimodal. In this work, we introduce a novel AIS scheme which incorporates modern techniques in stochastic optimization to improve the methodology for higher-dimensional posterior inference. More specifically, we derive update rules for the parameters of each proposal by means of deterministic mixture sampling and show that the method outperforms other state-of-the-art approaches in high-dimensional scenarios.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"69 1","pages":"5052-5056"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84446581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信