2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

筛选
英文 中文
Information bottleneck based speaker diarization of meetings using non-speech as side information 基于信息瓶颈的以非语音作为辅助信息的会议发言人划分
S. Yella, H. Bourlard
{"title":"Information bottleneck based speaker diarization of meetings using non-speech as side information","authors":"S. Yella, H. Bourlard","doi":"10.1109/ICASSP.2014.6853565","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853565","url":null,"abstract":"Background noise and errors in speech/non-speech detection cause significant degradation to the output of a speaker diarization system. In a typical speaker diarization system, non-speech segments are excluded prior to unsupervised clustering. In the current study, we exploit the information present in the non-speech segments of a recording to improve the output of the speaker diarization system based on information bottleneck framework. This is achieved by providing information from non-speech segments as side (irrelevant) information to information bottleneck based clustering. Experiments on meeting recordings from RT 06, 07, 09, evaluation sets have shown that the proposed method decreases the diarization error rate by around 18% relative to the baseline speaker diarization system based on information bottleneck framework. Comparison with a state of the art system based on HMM/GMM framework shows that the proposed method significantly decreases the gap in performance between the information bottleneck system and HMM/GMM system.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"81 1","pages":"96-100"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72639838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Amplitude and phase estimator for real-time biomedical spectral Doppler applications 用于实时生物医学频谱多普勒应用的幅度和相位估计器
S. Ricci, R. Matera, A. Dallai
{"title":"Amplitude and phase estimator for real-time biomedical spectral Doppler applications","authors":"S. Ricci, R. Matera, A. Dallai","doi":"10.1109/ICASSP.2014.6854584","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854584","url":null,"abstract":"In a typical echo-Doppler investigation the moving blood is periodically insonated by the transmitting bursts of ultrasound energy. The echoes, shifted in frequency according to the Doppler effect, are received, coherently demodulated and processed through a spectral estimator. The detected frequency shift can be exploited for blood velocity assessment. The spectral analysis is typically performed by the conventional Fast Fourier Transform (FFT), but, recently, the application of the Amplitude and Phase EStimator (APES) was proved to produce a good quality sonogram based on a reduced number of transmissions. Unfortunately, the much higher calculation effort needed by APES hampers its use in real-time applications. In this work, a fixed point DSP implementation of APES is presented. A spectral estimate - based on 32 transmissions - occurs in less than 120μs. Results obtained on echo-Doppler investigations on a volunteer are presented.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"15 1","pages":"5149-5152"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72655653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Look who's talking: Detecting the dominant speaker in a cluttered scenario 看谁在说话:在混乱的场景中发现占主导地位的说话人
Eleonora D'Arca, N. Robertson, J. Hopgood
{"title":"Look who's talking: Detecting the dominant speaker in a cluttered scenario","authors":"Eleonora D'Arca, N. Robertson, J. Hopgood","doi":"10.1109/ICASSP.2014.6853854","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853854","url":null,"abstract":"In this work we propose a novel method to automatically detect and localise the dominant speaker in an enclosed scenario by means of audio and video cues. The underpinning idea is that gesturing means speaking, so observing motions means observing an audio signal. To the best of our knowledge state-of-the-art algorithms are focussed on stationary motion scenarios and close-up scenes where only one audio source exists, whereas we enlarge the extent of the method to larger field of views and cluttered scenarios including multiple non-stationary moving speakers. In such contexts, moving objects which are not correlated to the dominant audio may exist and their motion may incorrectly drive the audio-video (AV) correlation estimation. This suggests extra localisation data may be fused at decision level to avoid detecting false positives. In this work, we learn Mel-frequency cepstral coefficients (MFCC) coefficients and correlate them to the optical flow. We also exploit the audio and video signals to estimate the position of the actual speaker, narrowing down the visual space of search, hence reducing the probability of incurring in a wrong voice-to-pixel region association. We compare our work with a state-of-the-art existing algorithm and show on real datasets a 36% precision improvement in localising a moving dominant speaker through occlusions and speech interferences.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"150 1","pages":"1532-1536"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77400161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Histogram of Log-Gabor Magnitude Patterns for face recognition 用于人脸识别的Log-Gabor幅度模式直方图
J. Yi, Fei Su
{"title":"Histogram of Log-Gabor Magnitude Patterns for face recognition","authors":"J. Yi, Fei Su","doi":"10.1109/ICASSP.2014.6853650","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853650","url":null,"abstract":"The Gabor-based features have achieved excellent performances for face recognition on traditional face databases. However, on the recent LFW (Labeled Faces in the Wild) face database, Gabor-based features attract little attention due to their high computing complexity and feature dimension and poor performance. In this paper, we propose a Gabor-based feature termed Histogram of Gabor Magnitude Patterns (HGMP) which is very simple but effective. HGMP adopts the Bag-of-Words (BoW) image representation framework. It views the Gabor filters as codewords and the Gabor magnitudes of each point as the responses of the point to these codewords. Then the point is coded by the orientation normalization and scale non-maximum suppression of its magnitudes, which are efficient to compute. Moreover, the number of codewords is so small that the feature dimension of HGMP is very low. In addition, we analyze the advantages of log-Gabor filters to Gabor filters to serve as the codewords, and propose to replace Gabor filters with log-Gabor filters in HGMP, which produces the Histogram of Log-Gabor Magnitude Patterns (HLGMP) feature. The experimental results on LFW show that HLGMP outperforms HGMP and it achieves the state-of-the-art performance, although its computing complexity and feature dimension are very low.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"55 1","pages":"519-523"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77634745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Interference shaping constraints for underlay MIMO interference channels 底层MIMO干扰信道的干扰整形约束
C. Lameiro, I. Santamaría, W. Utschick
{"title":"Interference shaping constraints for underlay MIMO interference channels","authors":"C. Lameiro, I. Santamaría, W. Utschick","doi":"10.1109/ICASSP.2014.6855020","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6855020","url":null,"abstract":"In this paper, a cognitive radio (CR) scenario comprised of a secondary interference channel (IC) and a primary point-to-point link (PPL) is studied, when the former interferes the latter. In order to satisfy a given rate requirement at the PPL, typical approaches impose an interference temperature constraint (IT). When the PPL transmits multiple streams, however, the spatial structure of the interference comes into play. In such cases, we show that spatial interference shaping constraints can provide higher sum-rate performance to the IC while ensuring the required rate at the PPL. Then, we extend the interference leakage minimization algorithm (MinIL) to incorporate such constraints. An additional power control step is included in the optimization procedure to improve the sum-rate when the interference alignment (IA) problem becomes infeasible due to the additional constraint. Numerical examples are provided to illustrate the effectiveness of the spatial shaping constraint in comparison to IT when the PPL transmits multiple data streams.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"49 1","pages":"7313-7317"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77731050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Hierarchical depth processing with adaptive search range and fusion 具有自适应搜索范围和融合的层次深度处理
Zucheul Lee, Truong Q. Nguyen
{"title":"Hierarchical depth processing with adaptive search range and fusion","authors":"Zucheul Lee, Truong Q. Nguyen","doi":"10.1109/ICASSP.2014.6853663","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853663","url":null,"abstract":"In this paper, we present an effective hierarchical depth processing and fusion for large stereo images. We propose the adaptive disparity search range based on the combined local structure from image and initial disparity. The adaptive search range can propagate the smoothness property at the coarse level to the fine level while preserving details and suppressing undesirable errors. The spatial-multiscale total variation method is investigated to enforce the spatial and scaling consistency of multi-scale depth estimates. The experimental results demonstrate that the proposed hierarchical scheme produces high quality and high resolution depth maps by fusing individual multi-scale depth maps, while reducing complexity.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"21 1","pages":"584-588"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77858538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multi-pitch tracking using Gaussian mixture model with time varying parameters and Grating Compression Transform 采用时变参数高斯混合模型和光栅压缩变换进行多基音跟踪
M. Abhijith, P. Ghosh, K. Rajgopal
{"title":"Multi-pitch tracking using Gaussian mixture model with time varying parameters and Grating Compression Transform","authors":"M. Abhijith, P. Ghosh, K. Rajgopal","doi":"10.1109/ICASSP.2014.6853842","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853842","url":null,"abstract":"Grating Compression Transform (GCT) is a two-dimensional analysis of speech signal which has been shown to be effective in multi-pitch tracking in speech mixtures. Multi-pitch tracking methods using GCT apply Kalman filter framework to obtain pitch tracks which requires training of the filter parameters using true pitch tracks. We propose an unsupervised method for obtaining multiple pitch tracks. In the proposed method, multiple pitch tracks are modeled using time-varying means of a Gaussian mixture model (GMM), referred to as TVGMM. The TVGMM parameters are estimated using multiple pitch values at each frame in a given utterance obtained from different patches of the spectrogram using GCT. We evaluate the performance of the proposed method on all voiced speech mixtures as well as random speech mixtures having well separated and close pitch tracks. TVGMM achieves multi-pitch tracking with 51% and 53% multi-pitch estimates having error ≤ 20% for random mixtures and all-voiced mixtures respectively. TVGMM also results in lower root mean squared error in pitch track estimation compared to that by Kalman filtering.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"60 1","pages":"1473-1477"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78168216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Objective similarity metrics for scenic bilevel images 风景双层图像的客观相似度度量
Yuanhao Zhai, D. Neuhoff
{"title":"Objective similarity metrics for scenic bilevel images","authors":"Yuanhao Zhai, D. Neuhoff","doi":"10.1109/ICASSP.2014.6854109","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854109","url":null,"abstract":"This paper proposes new objective similarity metrics for scenic bilevel images, which are images containing natural scenes such as landscapes and portraits. Though percentage error is the most commonly used similarity metric for bilevel images, it is not always consistent with human perception. Based on hypotheses about human perception of bilevel images, this paper proposes new metrics that outperform percentage error in the sense of attaining significantly higher Pearson and Spearman-rank correlation coefficients with respect to subjective ratings. The new metrics include Adjusted Percentage Error, Bilevel Gradient Histogram and Connected Components Comparison. The subjective ratings come from similarity evaluations described in a companion paper. Combinations of these metrics are also proposed, which exploit their complementarity to attain even better performance.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"96 1","pages":"2793-2797"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80127164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Multi-group multi-way relaying with reduced number of relay antennas 减少中继天线数量的多组多路中继
R. S. Ganesan, Hussein Al-Shatri, Xiang Li, T. Weber, A. Klein
{"title":"Multi-group multi-way relaying with reduced number of relay antennas","authors":"R. S. Ganesan, Hussein Al-Shatri, Xiang Li, T. Weber, A. Klein","doi":"10.1109/ICASSP.2014.6854093","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854093","url":null,"abstract":"In this paper, multi-group multi-way relaying is considered. There are L groups with K nodes in each group. Each node wants to share d data streams with all the other nodes in its group. A single MIMO relay assists the communications. The relay does not have enough antennas to spatially separate the data streams. However, the relay assists in performing interference alignment at the receivers. In order to find the interference alignment solution, we generalize the concept of signal and channel alignment developed for the MIMO Y channel and the two-way relay channel to group signal alignment and group channel alignment. In comparison to conventional multi-group multi-way relaying schemes [1, 2], where at least R ≥ LKd - d antennas are required, in our proposed scheme, exploiting the multiple antennas at the nodes, only R ≥ LKd - Ld antennas are needed. The number of antennas required at the nodes to achieve this is also derived. It is shown that the proposed interference alignment based scheme achieves more degrees of freedom than the reference schemes without interference alignment.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"30 1","pages":"2714-2718"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80223113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Motion detection with spatiotemporal sequences 基于时空序列的运动检测
T. Zhang, Haixian Wang
{"title":"Motion detection with spatiotemporal sequences","authors":"T. Zhang, Haixian Wang","doi":"10.1109/ICASSP.2014.6854422","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854422","url":null,"abstract":"In this paper we propose a new method to detect motion in a greyscale video. In our algorithm, several spatiotemporal sequences with different lengths are used to filter the frames in the video. Then these filtered images are combined together to get the real motion. The performance of our algorithm is tested with several human action datasets in which different actions are performed. The detected results of our algorithm are compared with previous works and the targets we extract manually. The experimental results show that the responses of our filter are close to the real action of the human in the original video.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"6 1","pages":"4344-4348"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79345497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信