Don-Guk Jeon, Yejoong Kim, Inhee Lee, Zhengya Zhang, D. Blaauw, D. Sylvester
{"title":"A low-power VGA full-frame feature extraction processor","authors":"Don-Guk Jeon, Yejoong Kim, Inhee Lee, Zhengya Zhang, D. Blaauw, D. Sylvester","doi":"10.1109/ICASSP.2013.6638152","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638152","url":null,"abstract":"This paper proposes an energy-efficient VGA full-frame feature extraction processor design. It is based on the SURF algorithm and makes various algorithmic modifications to improve efficiency and reduce hardware overhead while maintaining extraction performance. Low clock frequency and deep parallelism derived from a one-sample-per-cycle matched-throughput architecture provide significantly larger room for voltage scaling and enables full-frame extraction. The proposed design consumes 4.7mW at 400mV and achieves 72% higher energy efficiency than prior work.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115343579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Open-set semi-supervised audio-visual speaker recognition using co-training LDA and Sparse Representation Classifiers","authors":"Xuran Zhao, N. Evans, J. Dugelay","doi":"10.1109/ICASSP.2013.6638208","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638208","url":null,"abstract":"Semi-supervised learning is attracting growing interest within the biometrics community. Almost all prior work focuses on closed-set scenarios, in which samples labelled automatically are assumed to belong to an enrolled class. This is often not the case in realistic applications and thus open-set alternatives are needed. This paper proposes a new approach to open-set, semi-supervised learning based on co-training, Linear Discriminant Analysis (LDA) subspaces and Sparse Representation Classifiers (SRCs). Experiments on the standard MOBIO dataset show how the new approach can utilize automatically labelled data to augment a smaller, manually labelled dataset and thus improve the performance of an open-set audio-visual person recognition system.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"444 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116069269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data centric multi-shift sensor scheduling for wireless sensor networks","authors":"Jialin Zhang, Y. Hu","doi":"10.1109/ICASSP.2013.6638530","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638530","url":null,"abstract":"A multi-shift sensor scheduling method is proposed to extend the operating lifespan of a wireless sensor network. Sensor nodes in the WSN are partitioned into N subnetworks and the operating schedule is partitioned into N shifts of equal duration. Exploiting spatial correlations among sensor nodes, data collected using each subnetwork can well approximate the data collected using original sensor network. Each sub-network also form a connected component to ensure proper data collection. This task is formulated as a NP-hard constrained subset selection problem. A polynomial time heuristic algorithm leveraging breath-first search and subspace approximation is proposed. Simulations using a real world data set demonstrate superior performance and extended lifespan of this proposed method.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116560506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Fousek, Steven J. Rennie, Pierre L. Dognin, V. Goel
{"title":"Direct product based deep belief networks for automatic speech recognition","authors":"P. Fousek, Steven J. Rennie, Pierre L. Dognin, V. Goel","doi":"10.1109/ICASSP.2013.6638238","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638238","url":null,"abstract":"In this paper, we present new methods for parameterizing the connections of neural networks using sums of direct products. We show that low rank parameterizations of weight matrices are a subset of this set, and explore the theoretical and practical benefits of representing weight matrices using sums of Kronecker products. ASR results on a 50 hr subset of the English Broadcast News corpus indicate that the approach is promising. In particular, we show that a factorial network with more than 150 times less parameters in its bottom layer than its standard unconstrained counterpart suffers minimal WER degradation, and that by using sums of Kronecker products, we can close the gap in WER performance while maintaining very significant parameter savings. In addition, direct product DBNs consistently outperform standard DBNs with the same number of parameters. These results have important implications for research on deep belief networks (DBNs). They imply that we should be able to train neural networks with thousands of neurons and minimal restrictions much more rapidly than is currently possible, and that by using sums of direct products, it will be possible to train neural networks with literally millions of neurons tractably-an exciting prospect.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"379 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116579436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Gutierrez-Estevez, U. Krüger, K. Krueger, K. Manolakis, V. Jungnickel
{"title":"Acoustic channel model for adaptive downhole communication over deep drill strings","authors":"M. Gutierrez-Estevez, U. Krüger, K. Krueger, K. Manolakis, V. Jungnickel","doi":"10.1109/ICASSP.2013.6638589","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638589","url":null,"abstract":"For reducing costs in drilling technology, seismic prediction while drilling (SPWD) is envisioned. SPWD needs a fast data link bringing up the seismic data from bottomhole to the ground. In this paper, we propose a flexible and easy-to-use acoustic channel model for long drill strings. The model enables efficient design of adaptive OFDM communication links and prediction of achievable data rates for variable string dimensions. We describe acoustic wave propagation by the S-parameters of the drill string modelled as a series of alternating short and long resonators due to segments of constant acoustic impedance. All segments have been parametrised and the final channel is a concatenation of all its segments. We verify the new model by comparison with measurements on a 55 m long drill string. By using our model, the properties of a manifold of real drill pipes with variable dimensions can be predicted. We investigate the impact of length variations typical for rough drilling applications. For efficient communications over 1.5 km, length variations of the screwed tool joints should be limited to a few centimetres while the pipe length may vary up to one meter.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122764361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Behavior of greedy sparse representation algorithms on nested supports","authors":"B. Mailhé, Bob L. Sturm, Mark D. Plumbley","doi":"10.1109/ICASSP.2013.6638758","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638758","url":null,"abstract":"In this work, we study the links between the recovery properties of sparse signals for Orthogonal Matching Pursuit (OMP) and the whole General MP class over nested supports. We show that the optimality of those algorithms is not locally nested: there is a dictionary and supports I and J with J included in I such that OMP will recover all signals of support I, but not all signals of support J. We also show that the optimality of OMP is globally nested: if OMP can recover all s-sparse signals, then it can recover all s'-sparse signals with s' smaller than s. We also provide a tighter version of Donoho and Elad's spark theorem, which allows us to complete Tropp's proof that sparse representation algorithms can only be optimal for all s-sparse signals if s is strictly lower than half the spark of the dictionary.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122935237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An advanced feature compensation method employing acoustic model with phonetically constrained structure","authors":"Wooil Kim, J. Hansen","doi":"10.1109/ICASSP.2013.6639036","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6639036","url":null,"abstract":"This study proposes an effective model-based feature compensation method for robust speech recognition in background noise conditions. In the proposed scheme, an acoustic model with a phonetically constrained structure is employed for the Parallel Combined Gaussian Mixture Model (PCGMM [1]) based feature compensation method. The structure of the acoustic model includes a collection of context independent phone models. A phonetically constrained prior probability is formulated by integrating transition probability of phone models into the reconstruction procedure. Experimental results show that the PCGMM-based feature compensation employing the proposed phonetically constrained structure of acoustic model consistently outperforms the case of employing the conventional Gaussian mixture model. This demonstrates that the proposed configuration of the acoustic model is effective at improving the intelligibility of the speech reconstructed by the feature compensation method for speech recognition under diverse background noise conditions.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122598090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meng-Ge Wang, Yan Song, B. Jiang, Lirong Dai, I. Mcloughlin
{"title":"Exemplar based language recognition method for short-duration speech segments","authors":"Meng-Ge Wang, Yan Song, B. Jiang, Lirong Dai, I. Mcloughlin","doi":"10.1109/ICASSP.2013.6639091","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6639091","url":null,"abstract":"This paper proposes a novel exemplar-based language recognition method for short duration speech segments. It is known that language identity is a kind of weak information that can be deduced from the speech content. For short duration speech segments, the limited content also leads to a large intra-language variability. To address this issue, we propose a new method. This borrows a vector quantization based representation from image classification methods, and constructs the exemplar space using the popular i-vector representation of short duration speech segments. A mapping function is then defined to build the new representation. To evaluate the effectiveness of our proposed method, we conduct extensive experiments on the NIST LRE2007 dataset. The experimental results demonstrate improved performance for short duration speech segments.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122831906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Emotion classification via utterance-level dynamics: A pattern-based approach to characterizing affective expressions","authors":"Yelin Kim, E. Provost","doi":"10.1109/ICASSP.2013.6638344","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638344","url":null,"abstract":"Human emotion changes continuously and sequentially. This results in dynamics intrinsic to affective communication. One of the goals of automatic emotion recognition research is to computationally represent and analyze these dynamic patterns. In this work, we focus on the global utterance-level dynamics. We are motivated by the hypothesis that global dynamics have emotion-specific variations that can be used to differentiate between emotion classes. Consequently, classification systems that focus on these patterns will be able to make accurate emotional assessments. We quantitatively represent emotion flow within an utterance by estimating short-time affective characteristics. We compare time-series estimates of these characteristics using Dynamic Time Warping, a time-series similarity measure. We demonstrate that this similarity can effectively recognize the affective label of the utterance. The similarity-based pattern modeling outperforms both a feature-based baseline and static modeling. It also provides insight into typical high-level patterns of emotion. We visualize these dynamic patterns and the similarities between the patterns to gain insight into the nature of emotion expression.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122911711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aravind Namandi Vembu, P. Natarajan, Shuang Wu, R. Prasad, P. Natarajan
{"title":"Graph based multimodal word clustering for video event detection","authors":"Aravind Namandi Vembu, P. Natarajan, Shuang Wu, R. Prasad, P. Natarajan","doi":"10.1109/ICASSP.2013.6638342","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638342","url":null,"abstract":"Combining diverse low-level features from multiple modalities has consistently improved performance over a range of video processing tasks, including event detection. In our work, we study graph based clustering techniques for integrating information from multiple modalities by identifying word clusters spread across the different modalities. We present different methods to identify word clusters including word similarity graph partitioning, word-video co-clustering and Latent Semantic Indexing and the impact of different metrics to quantify the co-occurrence of words. We present experimental results on a ≈45000 video dataset used in the TRECVID MED 11 evaluations. Our experiments show that multimodal features have consistent performance gains over the use of individual features. Further, word similarity graph construction using a complete graph representation consistently improves over partite graphs and early fusion based multimodal systems. Finally, we see additional performance gains by fusing multimodal features with individual features.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122930374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}