2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献_第6页

Information bottleneck based speaker diarization of meetings using non-speech as side information 基于信息瓶颈的以非语音作为辅助信息的会议发言人划分

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2014-05-04 DOI: 10.1109/ICASSP.2014.6853565

S. Yella, H. Bourlard

引用次数: 12

Amplitude and phase estimator for real-time biomedical spectral Doppler applications 用于实时生物医学频谱多普勒应用的幅度和相位估计器

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2014-05-04 DOI: 10.1109/ICASSP.2014.6854584

S. Ricci, R. Matera, A. Dallai

引用次数: 3

Look who's talking: Detecting the dominant speaker in a cluttered scenario 看谁在说话:在混乱的场景中发现占主导地位的说话人

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2014-05-04 DOI: 10.1109/ICASSP.2014.6853854

Eleonora D'Arca, N. Robertson, J. Hopgood

{"title":"Look who's talking: Detecting the dominant speaker in a cluttered scenario","authors":"Eleonora D'Arca, N. Robertson, J. Hopgood","doi":"10.1109/ICASSP.2014.6853854","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853854","url":null,"abstract":"In this work we propose a novel method to automatically detect and localise the dominant speaker in an enclosed scenario by means of audio and video cues. The underpinning idea is that gesturing means speaking, so observing motions means observing an audio signal. To the best of our knowledge state-of-the-art algorithms are focussed on stationary motion scenarios and close-up scenes where only one audio source exists, whereas we enlarge the extent of the method to larger field of views and cluttered scenarios including multiple non-stationary moving speakers. In such contexts, moving objects which are not correlated to the dominant audio may exist and their motion may incorrectly drive the audio-video (AV) correlation estimation. This suggests extra localisation data may be fused at decision level to avoid detecting false positives. In this work, we learn Mel-frequency cepstral coefficients (MFCC) coefficients and correlate them to the optical flow. We also exploit the audio and video signals to estimate the position of the actual speaker, narrowing down the visual space of search, hence reducing the probability of incurring in a wrong voice-to-pixel region association. We compare our work with a state-of-the-art existing algorithm and show on real datasets a 36% precision improvement in localising a moving dominant speaker through occlusions and speech interferences.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"150 1","pages":"1532-1536"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77400161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Histogram of Log-Gabor Magnitude Patterns for face recognition 用于人脸识别的Log-Gabor幅度模式直方图

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2014-05-04 DOI: 10.1109/ICASSP.2014.6853650

J. Yi, Fei Su

{"title":"Histogram of Log-Gabor Magnitude Patterns for face recognition","authors":"J. Yi, Fei Su","doi":"10.1109/ICASSP.2014.6853650","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853650","url":null,"abstract":"The Gabor-based features have achieved excellent performances for face recognition on traditional face databases. However, on the recent LFW (Labeled Faces in the Wild) face database, Gabor-based features attract little attention due to their high computing complexity and feature dimension and poor performance. In this paper, we propose a Gabor-based feature termed Histogram of Gabor Magnitude Patterns (HGMP) which is very simple but effective. HGMP adopts the Bag-of-Words (BoW) image representation framework. It views the Gabor filters as codewords and the Gabor magnitudes of each point as the responses of the point to these codewords. Then the point is coded by the orientation normalization and scale non-maximum suppression of its magnitudes, which are efficient to compute. Moreover, the number of codewords is so small that the feature dimension of HGMP is very low. In addition, we analyze the advantages of log-Gabor filters to Gabor filters to serve as the codewords, and propose to replace Gabor filters with log-Gabor filters in HGMP, which produces the Histogram of Log-Gabor Magnitude Patterns (HLGMP) feature. The experimental results on LFW show that HLGMP outperforms HGMP and it achieves the state-of-the-art performance, although its computing complexity and feature dimension are very low.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"55 1","pages":"519-523"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77634745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Interference shaping constraints for underlay MIMO interference channels 底层MIMO干扰信道的干扰整形约束

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2014-05-04 DOI: 10.1109/ICASSP.2014.6855020

C. Lameiro, I. Santamaría, W. Utschick

引用次数: 4

Hierarchical depth processing with adaptive search range and fusion 具有自适应搜索范围和融合的层次深度处理

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2014-05-04 DOI: 10.1109/ICASSP.2014.6853663

Zucheul Lee, Truong Q. Nguyen

引用次数: 1

Multi-pitch tracking using Gaussian mixture model with time varying parameters and Grating Compression Transform 采用时变参数高斯混合模型和光栅压缩变换进行多基音跟踪

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2014-05-04 DOI: 10.1109/ICASSP.2014.6853842

M. Abhijith, P. Ghosh, K. Rajgopal

{"title":"Multi-pitch tracking using Gaussian mixture model with time varying parameters and Grating Compression Transform","authors":"M. Abhijith, P. Ghosh, K. Rajgopal","doi":"10.1109/ICASSP.2014.6853842","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853842","url":null,"abstract":"Grating Compression Transform (GCT) is a two-dimensional analysis of speech signal which has been shown to be effective in multi-pitch tracking in speech mixtures. Multi-pitch tracking methods using GCT apply Kalman filter framework to obtain pitch tracks which requires training of the filter parameters using true pitch tracks. We propose an unsupervised method for obtaining multiple pitch tracks. In the proposed method, multiple pitch tracks are modeled using time-varying means of a Gaussian mixture model (GMM), referred to as TVGMM. The TVGMM parameters are estimated using multiple pitch values at each frame in a given utterance obtained from different patches of the spectrogram using GCT. We evaluate the performance of the proposed method on all voiced speech mixtures as well as random speech mixtures having well separated and close pitch tracks. TVGMM achieves multi-pitch tracking with 51% and 53% multi-pitch estimates having error ≤ 20% for random mixtures and all-voiced mixtures respectively. TVGMM also results in lower root mean squared error in pitch track estimation compared to that by Kalman filtering.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"60 1","pages":"1473-1477"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78168216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Objective similarity metrics for scenic bilevel images 风景双层图像的客观相似度度量

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2014-05-04 DOI: 10.1109/ICASSP.2014.6854109

Yuanhao Zhai, D. Neuhoff

引用次数: 4

Multi-group multi-way relaying with reduced number of relay antennas 减少中继天线数量的多组多路中继

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2014-05-04 DOI: 10.1109/ICASSP.2014.6854093

R. S. Ganesan, Hussein Al-Shatri, Xiang Li, T. Weber, A. Klein

{"title":"Multi-group multi-way relaying with reduced number of relay antennas","authors":"R. S. Ganesan, Hussein Al-Shatri, Xiang Li, T. Weber, A. Klein","doi":"10.1109/ICASSP.2014.6854093","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854093","url":null,"abstract":"In this paper, multi-group multi-way relaying is considered. There are L groups with K nodes in each group. Each node wants to share d data streams with all the other nodes in its group. A single MIMO relay assists the communications. The relay does not have enough antennas to spatially separate the data streams. However, the relay assists in performing interference alignment at the receivers. In order to find the interference alignment solution, we generalize the concept of signal and channel alignment developed for the MIMO Y channel and the two-way relay channel to group signal alignment and group channel alignment. In comparison to conventional multi-group multi-way relaying schemes [1, 2], where at least R ≥ LKd - d antennas are required, in our proposed scheme, exploiting the multiple antennas at the nodes, only R ≥ LKd - Ld antennas are needed. The number of antennas required at the nodes to achieve this is also derived. It is shown that the proposed interference alignment based scheme achieves more degrees of freedom than the reference schemes without interference alignment.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"30 1","pages":"2714-2718"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80223113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Motion detection with spatiotemporal sequences 基于时空序列的运动检测

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2014-05-04 DOI: 10.1109/ICASSP.2014.6854422

T. Zhang, Haixian Wang

引用次数: 0