2013 IEEE International Conference on Acoustics, Speech and Signal Processing最新文献

筛选
英文 中文
A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data 一个实用的,自适应的语音活动检测器扬声器验证与嘈杂的电话和麦克风数据
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-10-21 DOI: 10.1109/ICASSP.2013.6639066
T. Kinnunen, Padmanabhan Rajan
{"title":"A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data","authors":"T. Kinnunen, Padmanabhan Rajan","doi":"10.1109/ICASSP.2013.6639066","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6639066","url":null,"abstract":"A voice activity detector (VAD) plays a vital role in robust speaker verification, where energy VAD is most commonly used. Energy VAD works well in noise-free conditions but deteriorates in noisy conditions. One way to tackle this is to introduce speech enhancement preprocessing. We study an alternative, likelihood ratio based VAD that trains speech and nonspeech models on an utterance-by-utterance basis from mel-frequency cepstral coefficients (MFCCs). The training labels are obtained from enhanced energy VAD. As the speech and nonspeech models are re-trained for each utterance, minimum assumptions of the background noise are made. According to both VAD error analysis and speaker verification results utilizing state-of-the-art i-vector system, the proposed method outperforms energy VAD variants by a wide margin. We provide open-source implementation of the method.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"164 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134593848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 109
Grapheme and multilingual posterior features for under-resourced speech recognition: A study on Scottish Gaelic 资源不足语音识别的字素和多语言后验特征:苏格兰盖尔语的研究
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-10-21 DOI: 10.1109/ICASSP.2013.6639087
Ramya Rasipuram, P. Bell, M. Magimai.-Doss
{"title":"Grapheme and multilingual posterior features for under-resourced speech recognition: A study on Scottish Gaelic","authors":"Ramya Rasipuram, P. Bell, M. Magimai.-Doss","doi":"10.1109/ICASSP.2013.6639087","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6639087","url":null,"abstract":"Standard automatic speech recognition (ASR) systems use phonemes as subword units. Thus, one of the primary resource required to build a good ASR system is a well developed phoneme pronunciation lexicon. However, under-resourced languages typically lack such lexical resources. In this paper, we investigate recently proposed grapheme-based ASR in the framework of Kullback-Leibler divergence based hidden Markov model (KL-HMM) for underresourced languages, particularly Scottish Gaelic which has no lexical resources. More specifically, we study the use of grapheme and multilingual phoneme class conditional probabilities (posterior features) as feature observations in KL-HMM. ASR studies conducted show that the proposed approach yields better system compared to the conventional HMM/GMM approach using cepstral features. Furthermore, grapheme posterior features estimated using both auxiliary data and Gaelic data yield the best system.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128693161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Clustering similar acoustic classes in the Fishervoice framework 在fishvoice框架中聚类相似的声学类
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-10-21 DOI: 10.1109/ICASSP.2013.6639167
Na Li, W. Jiang, H. Meng, Zhifeng Li
{"title":"Clustering similar acoustic classes in the Fishervoice framework","authors":"Na Li, W. Jiang, H. Meng, Zhifeng Li","doi":"10.1109/ICASSP.2013.6639167","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6639167","url":null,"abstract":"In the Fishervoice (FSH) based framework, the mean supervectors of the speaker models are divided into several subvectors by mixture index. However, this division strategy cannot capture local acoustic class structure information among similar acoustic classes or discriminative information between different acoustic classes. In order to verify whether or not local structure information can help improve system performance, we develop five different speaker supervector segmentation methods. Experiments on NIST SRE08 prove that clustering similar acoustic classes together improves the system performance. In particular, the proposed method of equal size clustering achieves 5.1% relative decrease on EER compared to FSH1.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"169 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127308659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identification of genes consistently co-expressed in multiple microarray datasets by a genome-wide Bi-CoPaM approach 通过全基因组Bi-CoPaM方法鉴定在多个微阵列数据集中一致共表达的基因
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-10-21 DOI: 10.1109/ICASSP.2013.6637835
Basel Abu-Jamous, Rui Fa, D. Roberts, A. Nandi
{"title":"Identification of genes consistently co-expressed in multiple microarray datasets by a genome-wide Bi-CoPaM approach","authors":"Basel Abu-Jamous, Rui Fa, D. Roberts, A. Nandi","doi":"10.1109/ICASSP.2013.6637835","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6637835","url":null,"abstract":"Many methods have been proposed to identify informative subsets of genes in microarray studies in order to focus the research. For instance, the recently proposed binarization of consensus partition matrices (Bi-CoPaM) method has, amongst its various features, the ability to generate tight clusters of genes while leaving many genes unassigned from all clusters. We propose exploiting this particular feature by applying the Bi-CoPaM over genome-wide microarray data from multiple datasets to generate more clusters than required. Then, these clusters are tightened so that most of their genes are left unassigned from all clusters, and most of the clusters are left totally empty. The tightened clusters, which are still not empty, include those genes that are consistently co-expressed in multiple datasets when examined by various clustering methods. An example of this is demonstrated in this paper for cyclic and acyclic genes as well as for genes that are highly expressed and that are not. Thus, the results of our proposed approach cannot be reproduced by other methods of genes' periodicity identification or by other methods of clustering.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127850095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A comparative study of different classifiers for detecting depression from spontaneous speech 不同分类器检测自发性言语抑郁的比较研究
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-10-21 DOI: 10.1109/ICASSP.2013.6639227
Sharifa Alghowinem, Roland Göcke, M. Wagner, J. Epps, Tom Gedeon, M. Breakspear, G. Parker
{"title":"A comparative study of different classifiers for detecting depression from spontaneous speech","authors":"Sharifa Alghowinem, Roland Göcke, M. Wagner, J. Epps, Tom Gedeon, M. Breakspear, G. Parker","doi":"10.1109/ICASSP.2013.6639227","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6639227","url":null,"abstract":"Accurate detection of depression from spontaneous speech could lead to an objective diagnostic aid to assist clinicians to better diagnose depression. Little thought has been given so far to which classifier performs best for this task. In this study, using a 60-subject real-world clinically validated dataset, we compare three popular classifiers from the affective computing literature - Gaussian Mixture Models (GMM), Support Vector Machines (SVM) and Multilayer Perceptron neural networks (MLP) - as well as the recently proposed Hierarchical Fuzzy Signature (HFS) classifier. Among these, a hybrid classifier using GMM models and SVM gave the best overall classification results. Comparing feature, score, and decision fusion, score fusion performed better for GMM, HFS and MLP, while decision fusion worked best for SVM (both for raw data and GMM models). Feature fusion performed worse than other fusion methods in this study. We found that loudness, root mean square, and intensity were the voice features that performed best to detect depression in this dataset.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126193558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 82
Mitigation of clipping in sensors 减轻传感器的削波
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-10-21 DOI: 10.1109/ICASSP.2013.6638803
Shang-Kee Ting, A. H. Sayed
{"title":"Mitigation of clipping in sensors","authors":"Shang-Kee Ting, A. H. Sayed","doi":"10.1109/ICASSP.2013.6638803","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638803","url":null,"abstract":"One major source of nonlinear distortion in analog-to-digital converters (ADCs) is clipping. The problem introduces spurious noise across the bandwidth of the sampled data. Prior works recover the signal from the acquired samples by relying on oversampling or on the assumption of vacant frequency bands and on the use of sparse signal representations. In this work, we propose a different approach, which uses two streams of data to mitigate the clipping distortion. Simulation results show an SNR improvement of 9dB, while the conventional approaches may even degrade the SNR in some situations.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133139472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Adaptive feature split selection for co-training: Application to tire irregular wear classification 自适应特征分割选择协同训练:在轮胎不规则磨损分类中的应用
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-10-21 DOI: 10.1109/ICASSP.2013.6638308
Wei Du, R. Phlypo, T. Adalı
{"title":"Adaptive feature split selection for co-training: Application to tire irregular wear classification","authors":"Wei Du, R. Phlypo, T. Adalı","doi":"10.1109/ICASSP.2013.6638308","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638308","url":null,"abstract":"Co-training is a practical and powerful semi-supervised learning method. It yields high classification accuracy with a training data set containing only a small set of labeled data. Successful performance in co-training requires two important conditions on the features: diversity and sufficiency. In this paper, we propose a novel mutual information (MI) based approach inspired by the idea of dependent component analysis (DCA) to achieve feature splits that are maximally independent between-subsets (diversity) or within-subsets (sufficiency). We evaluate the relationship between the classification performance and the relative importance of the two conditions. Experimental results on actual tire data indicate that compared to diversity, sufficiency has a more significant impact on their classification accuracy. Further results show that co-training with feature splits obtained by the MI-based approach yields higher accuracy than supervised classification and significantly higher when using a small set of labeled training data.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121293894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Interactive multimodal music transcription 交互式多模态音乐转录
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-10-21 DOI: 10.1109/ICASSP.2013.6637639
J. Quereda, C. Pérez-Sancho
{"title":"Interactive multimodal music transcription","authors":"J. Quereda, C. Pérez-Sancho","doi":"10.1109/ICASSP.2013.6637639","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6637639","url":null,"abstract":"Automatic music transcription has usually been performed as an autonomous task and its evaluation has been made in terms of precision, recall, accuracy, etc. Nevertheless, in this work, assuming that the state of the art is far from being perfect, it is considered as an interactive one, where an expert user is assisted in its work by a transcription tool. In this context, the performance evaluation of the system turns into an assessment of how many user interactions are needed to complete the work. The strategy is that the user interactions can be used by the system to improve its performance in an adaptive way, thus minimizing the workload. Also, a multimodal approach has been implemented, in such a way that different sources of information, like onsets, beats, and meter, are used to detect notes in a musical audio excerpt. The system is focused on monotimbral polyphonic transcription.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126560594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Discriminatively trained Bayesian speaker comparison of i-vectors 判别训练贝叶斯说话人比较的i向量
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-10-21 DOI: 10.1109/ICASSP.2013.6639153
B. J. Borgstrom, A. McCree
{"title":"Discriminatively trained Bayesian speaker comparison of i-vectors","authors":"B. J. Borgstrom, A. McCree","doi":"10.1109/ICASSP.2013.6639153","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6639153","url":null,"abstract":"This paper presents a framework for fully Bayesian speaker comparison of i-vectors. By generalizing the train/test paradigm, we derive an analytic expression for the speaker comparison log-likelihood ratio (LLR), as well as solutions for model training and Bayesian scoring. This framework is useful for enrollment sets of any size. For the specific case of single-cut enrollment, it is shown to be mathematically equivalent to probabilistic linear discriminant analysis (PLDA). Additionally, we present discriminative training of model hyper-parameters by minimizing the total cross entropy between LLRs and class labels. When applied to speaker recognition, significant performance gains are observed for various NIST SRE 2010 extended evaluation tasks.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125928860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Tracking sparse signal sequences from nonlinear/non-Gaussian measurements and applications in illumination-motion tracking 从非线性/非高斯测量中跟踪稀疏信号序列及其在光照运动跟踪中的应用
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-10-21 DOI: 10.1109/ICASSP.2013.6638941
Rituparna Sarkar, Samarjit Das, Namrata Vaswani
{"title":"Tracking sparse signal sequences from nonlinear/non-Gaussian measurements and applications in illumination-motion tracking","authors":"Rituparna Sarkar, Samarjit Das, Namrata Vaswani","doi":"10.1109/ICASSP.2013.6638941","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638941","url":null,"abstract":"In this work, we develop algorithms for tracking time sequences of sparse spatial signals with slowly changing sparsity patterns, and other unknown states, from a sequence of nonlinear observations corrupted by (possibly) non-Gaussian noise. A key example of the above problem occurs in tracking moving objects across spatially varying illumination changes, where motion is the small dimensional state while the illumination image is the sparse spatial signal satisfying the slow-sparsity-pattern-change property.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130934329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信