ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

筛选
英文 中文
Shift-invariant Subspace Tracking with Missing Data 缺失数据的平移不变子空间跟踪
Myung Cho, Yuejie Chi
{"title":"Shift-invariant Subspace Tracking with Missing Data","authors":"Myung Cho, Yuejie Chi","doi":"10.1109/ICASSP.2019.8683025","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683025","url":null,"abstract":"Subspace tracking is an important problem in signal processing that finds applications in wireless communications, video surveillance, and source localization in radar and sonar. In recent years, it is recognized that a low-dimensional subspace can be estimated and tracked reliably even when the data vectors are partially observed with many missing entries, which is greatly desirable when processing high-dimensional and high-rate data to reduce the sampling requirement. This paper is motivated by the observation that the underlying low-dimensional subspace may possess additional structural properties induced by the physical model of data, which if harnessed properly, can greatly improve subspace tracking performance. As a case study, this paper investigates the problem of tracking direction-of-arrivals from subsampled observations in a unitary linear array, where the signals lie in a subspace spanned by columns of a Vandermonde matrix. We exploit the shift-invariant structure by mapping the data vector to a latent Hankel matrix, and then perform tracking over the Hankel matrices by exploiting their low-rank properties. Numerical simulations are conducted to validate the superiority of the proposed approach over existing subspace tracking methods that do not exploit the additional shift-invariant structure in terms of tracking speed and agility.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"13 1","pages":"8222-8225"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81207682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
How Transferable Are Features in Convolutional Neural Network Acoustic Models across Languages? 卷积神经网络声学模型的特征跨语言可转移性如何?
J. Thompson, M. Schönwiesner, Yoshua Bengio, D. Willett
{"title":"How Transferable Are Features in Convolutional Neural Network Acoustic Models across Languages?","authors":"J. Thompson, M. Schönwiesner, Yoshua Bengio, D. Willett","doi":"10.1109/ICASSP.2019.8683043","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683043","url":null,"abstract":"Characterization of the representations learned in intermediate layers of deep networks can provide valuable insight into the nature of a task and can guide the development of well-tailored learning strategies. Here we study convolutional neural network (CNN)-based acoustic models in the context of automatic speech recognition. Adapting a method proposed by [1], we measure the transferability of each layer between English, Dutch and German to assess their language-specificity. We observed three distinct regions of transferability: (1) the first two layers were entirely transferable between languages, (2) layers 2–8 were also highly transferable but we found some evidence of language specificity, (3) the subsequent fully connected layers were more language specific but could be successfully finetuned to the target language. To further probe the effect of weight freezing, we performed follow-up experiments using freeze-training [2]. Our results are consistent with the observation that CNNs converge ‘bottom up’ during training and demonstrate the benefit of freeze training, especially for transfer learning.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"56 5 1","pages":"2827-2831"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83391543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Generalized Boundary Detection Using Compression-based Analytics 基于压缩分析的广义边界检测
Christina L. Ting, R. Field, T. Quach, Travis L. Bauer
{"title":"Generalized Boundary Detection Using Compression-based Analytics","authors":"Christina L. Ting, R. Field, T. Quach, Travis L. Bauer","doi":"10.1109/ICASSP.2019.8682257","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682257","url":null,"abstract":"We present a new method for boundary detection within sequential data using compression-based analytics. Our approach is to approximate the information distance between two adjacent sliding windows within the sequence. Large values in the distance metric are indicative of boundary locations. A new algorithm is developed, referred to as sliding information distance (SLID), that provides a fast, accurate, and robust approximation to the normalized information distance. A modified smoothed z-score algorithm is used to locate peaks in the distance metric, indicating boundary locations. A variety of data sources are considered, including text and audio, to demonstrate the efficacy of our approach.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"3522-3526"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89339663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Transfer Learning Using Raw Waveform Sincnet for Robust Speaker Diarization 基于原始波形自网的稳健说话人特征化迁移学习
Harishchandra Dubey, A. Sangwan, J. Hansen
{"title":"Transfer Learning Using Raw Waveform Sincnet for Robust Speaker Diarization","authors":"Harishchandra Dubey, A. Sangwan, J. Hansen","doi":"10.1109/ICASSP.2019.8683023","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683023","url":null,"abstract":"Speaker diarization tells who spoke and whenƒ in an audio stream. SincNet is a recently developed novel convolutional neural network (CNN) architecture where the first layer consists of parameterized sinc filters. Unlike conventional CNNs, SincNet take raw speech waveform as input. This paper leverages SincNet in vanilla transfer learning (VTL) setup. Out-domain data is used for training SincNet-VTL to perform frame-level speaker classification. Trained SincNet-VTL is later utilized as feature extractor for in-domain data. We investigated pooling (max, avg) strategies for deriving utterance-level embedding using frame-level features extracted from trained network. These utterance/segment level embedding are adopted as speaker models during clustering stage in diarization pipeline. We compared the proposed SincNet-VTL embedding with baseline i-vector features. We evaluated our approaches on two corpora, CRSS-PLTL and AMI. Results show the efficacy of trained SincNet-VTL for speaker-discriminative embedding even when trained on small amount of data. Proposed features achieved relative DER improvements of 19.12% and 52.07% for CRSS-PLTL and AMI data, respectively over baseline i-vectors.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"19 1","pages":"6296-6300"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89391086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Accurate Vehicle Detection Using Multi-camera Data Fusion and Machine Learning 基于多摄像头数据融合和机器学习的精确车辆检测
Hao Wu, Xinxiang Zhang, B. Story, D. Rajan
{"title":"Accurate Vehicle Detection Using Multi-camera Data Fusion and Machine Learning","authors":"Hao Wu, Xinxiang Zhang, B. Story, D. Rajan","doi":"10.1109/ICASSP.2019.8683350","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683350","url":null,"abstract":"Computer-vision methods have been extensively used in intelligent transportation systems for vehicle detection. However, the detection of severely occluded or partially observed vehicles due to the limited camera fields of view remains a challenge. This paper presents a multi-camera vehicle detection system that significantly improves the detection performance under occlusion conditions. The key elements of the proposed method include a novel multi-view region proposal network that localizes the candidate vehicles on the ground plane. We also infer the vehicle position on the ground plane by leveraging multi-view cross-camera context. Experiments are conducted on dataset captured from a roadway in Richardson, TX, USA, and the system attains 0.7849 Average Precision and 0.7089 Multi Object Detection Precision. The proposed system results in an approximately 31.2% increase in AP and 8.6% in MODP than the single-camera methods.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"50 1","pages":"3767-3771"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90629206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Multi-channel Itakura Saito Distance Minimization with Deep Neural Network 基于深度神经网络的多通道Itakura Saito距离最小化
M. Togami
{"title":"Multi-channel Itakura Saito Distance Minimization with Deep Neural Network","authors":"M. Togami","doi":"10.1109/ICASSP.2019.8683410","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683410","url":null,"abstract":"A multi-channel speech source separation with a deep neural network which optimizes not only the time-varying variance of a speech source but also the multi-channel spatial covariance matrix jointly without any iterative optimization method is shown. Instead of a loss function which does not evaluate spatial characteristics of the output signal, the proposed method utilizes a loss function based on minimization of multi-channel Itakura-Saito Distance (MISD), which evaluates spatial characteristics of the output signal. The cost function based on MISD is calculated by the estimated posterior probability density function (PDF) of each speech source based on a time-varying Gaussian distribution model. The loss function of the neural network and the PDF of each speech source that is assumed in multi-channel speech source separation are consistent with each other. As a neural-network architecture, the proposed method utilizes multiple bidirectional long-short term memory (BLSTM) layers. The BLSTM layers and the successive complex-valued signal processing are jointly optimized in the training phase. Experimental results show that more accurately separated speech signal can be obtained with neural network parameters optimized based on the proposed MISD minimization than that with neural network parameters optimized based on loss functions without spatial covariance matrix evaluation.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"109 1","pages":"536-540"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80652311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Deep Speaker Representation Using Orthogonal Decomposition and Recombination for Speaker Verification 基于正交分解和重组的深度说话人表示用于说话人验证
I. Kim, Kyu-hong Kim, Ji-Whan Kim, Changkyu Choi
{"title":"Deep Speaker Representation Using Orthogonal Decomposition and Recombination for Speaker Verification","authors":"I. Kim, Kyu-hong Kim, Ji-Whan Kim, Changkyu Choi","doi":"10.1109/ICASSP.2019.8683332","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683332","url":null,"abstract":"Speech signal contains intrinsic and extrinsic variations such as accent, emotion, dialect, phoneme, speaking manner, noise, music, and reverberation. Some of these variations are unnecessary and are unspecified factors of variation. These factors lead to increased variability in speaker representation. In this paper, we assume that unspecified factors of variation exist in speaker representations, and we attempt to minimize variability in speaker representation. The key idea is that a primal speaker representation can be decomposed into orthogonal vectors and these vectors are recombined by using deep neural networks (DNN) to reduce speaker representation variability, yielding performance improvement for speaker verification (SV). The experimental results show that our proposed approach produces a relative equal error rate (EER) reduction of 47.1% compared to the use of the same convolutional neural network (CNN) architecture on the Vox-Celeb dataset. Furthermore, our proposed method provides significant improvement for short utterances.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"6126-6130"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79555275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Improving Graph Trend Filtering with Non-convex Penalties 用非凸惩罚改进图趋势过滤
R. Varma, Harlin Lee, Yuejie Chi, J. Kovacevic
{"title":"Improving Graph Trend Filtering with Non-convex Penalties","authors":"R. Varma, Harlin Lee, Yuejie Chi, J. Kovacevic","doi":"10.1109/ICASSP.2019.8683279","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683279","url":null,"abstract":"In this paper, we study the denoising of piecewise smooth graph signals that exhibit inhomogeneous levels of smoothness over a graph. We extend the graph trend filtering framework to a family of nonconvex regularizers that exhibit superior recovery performance over existing convex ones. We present theoretical results in the form of asymptotic error rates for both generic and specialized graph models. We further present an ADMM-based algorithm to solve the proposed optimization problem and analyze its convergence. Numerical performance of the proposed framework with non-convex regularizers on both synthetic and real-world data are presented for denoising, support recovery, and semi-supervised classification.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"11 1","pages":"5391-5395"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89840257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Exact Discrete-time Realizations of the Gammatone Filter γ酮滤波器的精确离散时间实现
Elizabeth Ren, Hans-Andrea Loeliger
{"title":"Exact Discrete-time Realizations of the Gammatone Filter","authors":"Elizabeth Ren, Hans-Andrea Loeliger","doi":"10.1109/ICASSP.2019.8683073","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683073","url":null,"abstract":"The paper derives an exact discrete-time state space realization of the popular gammatone filter. No such realization appears to be available in the literature. The proposed realization is computationally attractive: a gammatone filter with exponent N requires less than 6N multiplications and additions per sample. The integer coefficients of the realization can be computed by a simple recursion. The proposed realization also yields a closed-form expression for the frequency response. The proposed primary realization is not quite in a standard form, but it is easily transformed into another realization whose state transition matrix is in Jordan canonical form.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"31 1","pages":"316-320"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89967426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Collaboration between Bordeaux-inp and Utp, from Research to Education, in the Field of Signal Processing 波尔多inp和Utp在信号处理领域的合作,从研究到教育
Fernando Merchan, Héctor Poveda, É. Grivel
{"title":"Collaboration between Bordeaux-inp and Utp, from Research to Education, in the Field of Signal Processing","authors":"Fernando Merchan, Héctor Poveda, É. Grivel","doi":"10.1109/ICASSP.2019.8683079","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683079","url":null,"abstract":"The purpose of this paper is to share our positive experience about the collaboration launched a few years ago between UTP (Panama) and Bordeaux INP (France) in the field of signal processing. This collaboration involves research and education activities. This has led to numerous internships of French students in Panama, mobilities of researchers, common research papers, and the 1st double diploma signed between France and a country of Central America. Thus, this paper presents the various aspects of the collaboration.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"12 1","pages":"7645-7649"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75754165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信