2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)最新文献

筛选
英文 中文
Palestinian Arabic regional accent recognition 巴勒斯坦阿拉伯地区口音识别
Abualsoud Hanani, H. Basha, Y. Sharaf, Stephen Eugene Taylor
{"title":"Palestinian Arabic regional accent recognition","authors":"Abualsoud Hanani, H. Basha, Y. Sharaf, Stephen Eugene Taylor","doi":"10.1109/SPED.2015.7343088","DOIUrl":"https://doi.org/10.1109/SPED.2015.7343088","url":null,"abstract":"We attempt to automatically recognize the speaker's accent among regional Arabic Palestinian accents from four different regions of Palestine, i.e. Jerusalem (JE), Hebron (HE), Nablus (NA) and Ramallah (RA). To achieve this goal, we applied the state of the art techniques used in speaker and language identification, namely, Gaussian Mixture Model - Universal Background Model (GMM-UBM), Gaussian Mixture Model - Support Vector Machines (GMM-SVM) and I-vector framework. All of these systems were trained and tested on speech of 200 speakers. GMM-SVM and I-vector systems outperformed the baseline GMM-UBM system. The best result (accuracy of 81.5%) was obtained by an I-vector system with 64 Gaussian components, compared to an accuracy of 73.4% achieved by human listeners on the same testing utterances.","PeriodicalId":426074,"journal":{"name":"2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114376853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Quantization effects on audio signals for detecting intruders in wild areas using TESPAR S-matrix and artificial neural networks 基于TESPAR s -矩阵和人工神经网络的野外入侵检测音频信号量化效应
L. Grama, C. Rusu, G. Oltean, L. Ivanciu
{"title":"Quantization effects on audio signals for detecting intruders in wild areas using TESPAR S-matrix and artificial neural networks","authors":"L. Grama, C. Rusu, G. Oltean, L. Ivanciu","doi":"10.1109/SPED.2015.7343079","DOIUrl":"https://doi.org/10.1109/SPED.2015.7343079","url":null,"abstract":"This paper analyses the influence of quantization of audio signals on the Time Encoding Signal Processing and Recognition S-matrix, in order to detect and classify intruders in wildlife areas. The intruder classification is performed with multilayer feed-forward neural networks. The databases involved in this work consist of 640 waveforms of audio signals originated from 4 different types of sources. The experimental results proves that in the proposed audio based wildlife intruder detection framework, the overall correct classification rates remain very high even if the number of bits used for quantization decreases from 16 to 4.","PeriodicalId":426074,"journal":{"name":"2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134238416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Spectrograms, sparsograms and spectral signatures for wildlife intruder detection 野生动物入侵者检测的光谱图、稀疏图和光谱特征
C. Rusu, L. Grama
{"title":"Spectrograms, sparsograms and spectral signatures for wildlife intruder detection","authors":"C. Rusu, L. Grama","doi":"10.1109/SPED.2015.7343103","DOIUrl":"https://doi.org/10.1109/SPED.2015.7343103","url":null,"abstract":"In this paper some properties of spectrograms and sparsograms are reviewed. The framework addressed is acoustic based wildlife intruder detection. The spectral signatures are also recalled within this framework. The averaged binary sparsogram is introduced and it is shown that it can be considered an effective tool for classifying the possible intruders sounds into different classes.","PeriodicalId":426074,"journal":{"name":"2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127046858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Phonetic segmentation of speech using STEP and t-SNE 使用STEP和t-SNE进行语音的语音分割
Adriana Stan, Cassia Valentini-Botinhao, M. Giurgiu, Simon King
{"title":"Phonetic segmentation of speech using STEP and t-SNE","authors":"Adriana Stan, Cassia Valentini-Botinhao, M. Giurgiu, Simon King","doi":"10.1109/SPED.2015.7343105","DOIUrl":"https://doi.org/10.1109/SPED.2015.7343105","url":null,"abstract":"This paper introduces a first attempt to perform phoneme-level segmentation of speech based on a perceptual representation - the Spectro Temporal Excitation Pattern (STEP) - and a dimensionality reduction technique - the t-Distributed Stochastic Neighbour Embedding (t-SNE). The method searches for the true phonetic boundaries in the vicinity of those produced by an HMM-based segmentation. It looks for perceptually-salient spectral changes which occur at these phonetic transitions, and exploits t-SNE's ability to capture both local and global structure of the data. The method is intended to be used in any language and it is therefore not tailored to any particular dataset or language. Results show that this simple approach improves segmentation accuracy of unvoiced phonemes by 4% within a 5 ms margin, and 5% at a 10 ms margin. For the voiced phonemes, however, accuracy drops slightly.","PeriodicalId":426074,"journal":{"name":"2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127494802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Evaluation of the generative and discriminative text-independent speaker verification approaches on handheld devices 手持设备上生成和判别文本无关说话人验证方法的评价
Florin Curelaru
{"title":"Evaluation of the generative and discriminative text-independent speaker verification approaches on handheld devices","authors":"Florin Curelaru","doi":"10.1109/SPED.2015.7343091","DOIUrl":"https://doi.org/10.1109/SPED.2015.7343091","url":null,"abstract":"This paper takes advantage of the “MIT Mobile Device Speaker Verification Corpus” (MIT-MDSVC) availability in order to evaluate the performance of three well known text-independent speaker verification approaches on handheld devices, considering the MIT-MDSVC as a representative corpus designed for robust speaker verification tasks on limited vocabulary and limited amount of training data collected on handheld devices. Several experiments with either mismatched testing conditions, or with samples collected from multiple test conditions were conducted for evaluating both text-independent approaches: generative (based on Gaussian Mixture Models) and discriminative (based on Support Vector Machines with Fisher kernel and GMM Supervector Linear kernel), without using the transcription of the utterances or knowledge about the acoustic conditions of the recordings (environment and microphone). An equal error rate less than 3% was achieved using Gaussian Mixture Models, and a slightly greater equal error rate (less than 3.5%) was achieved using Support Vector Machines with Fisher kernel and with GMM Supervector Linear kernel, against any possible acoustic conditions.","PeriodicalId":426074,"journal":{"name":"2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133306720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Speech database acquisition for assisted living environment applications 辅助生活环境应用的语音数据库采集
Mihai Dogariu, H. Cucu, Andi Buzo, D. Burileanu, O. Fratu
{"title":"Speech database acquisition for assisted living environment applications","authors":"Mihai Dogariu, H. Cucu, Andi Buzo, D. Burileanu, O. Fratu","doi":"10.1109/SPED.2015.7343083","DOIUrl":"https://doi.org/10.1109/SPED.2015.7343083","url":null,"abstract":"Home automation has become a subject of increasing interest for both industry and research as there is an increase in the awareness of such systems and their benefits can be easily seen. The new trend is to develop smart homes where commands can be given by speech. This way of communication, besides being the most natural, has the advantage of offering flexibility to the users especially when they have limited motion capabilities. As for widely used languages the state of the art has achieved an important level of performance, little efforts are made with the Romanian language. The main reason for this is the lack of an annotated speech database from real life conditions. This paper focuses on the methodology of acquiring four different speech corpora with various end-user scenarios in mind. The commands corpus is meant to be used in home automation development, the cough corpus is meant to help research in detecting distress situations, the spontaneous speech corpus will aid in distant speech recognition applications and the multi-room, multi-person, multi-language corpus can be used for research in speaker detection and identification. All these were recorded in the context of a completely automated and functional smart home. The small number of such environments available to the public makes these corpora valuable from experimental point of view.","PeriodicalId":426074,"journal":{"name":"2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133371144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Estimating competing speaker count for blind speech source separation 盲语音源分离中竞争说话人数估计
Valentin Andrei, H. Cucu, Andi Buzo, C. Burileanu
{"title":"Estimating competing speaker count for blind speech source separation","authors":"Valentin Andrei, H. Cucu, Andi Buzo, C. Burileanu","doi":"10.1109/SPED.2015.7343081","DOIUrl":"https://doi.org/10.1109/SPED.2015.7343081","url":null,"abstract":"We present a method for estimating the number of simultaneous speakers for direct integration with blind speech source separation algorithms. The method was developed to use single microphone recordings but is fully compatible with microphone-array approaches. Speech source separation algorithms based on independent component analysis, multiband analysis or spectral learning need the number of concurrent speakers as an input parameter. This is estimated based on pattern matching techniques between the spectrogram of the speech mixture and the ones associated to a set of single speaker references. The method demonstrated to scale up until at least 10 concurrent speakers. Additionally we highlight the separation performance of various speech separation algorithms using mixtures with 3 competing speeches.","PeriodicalId":426074,"journal":{"name":"2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131962297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Sound event recognition in smart environments 智能环境中的声音事件识别
Gheorghe Pop, Alexandru Caranica, H. Cucu, D. Burileanu
{"title":"Sound event recognition in smart environments","authors":"Gheorghe Pop, Alexandru Caranica, H. Cucu, D. Burileanu","doi":"10.1109/SPED.2015.7343087","DOIUrl":"https://doi.org/10.1109/SPED.2015.7343087","url":null,"abstract":"A rich research was reported lately on sound event recognition (SER), a particular case of audio signal classification (ASC), which in turn is part of the more general research field of auditory scene analysis (ASA). The classification of sound events in a given environment is generally more precise with fewer classes and with better knowledge of sound events expected to occur in each class. Various techniques were described in the literature which allow good performance when sound events are strictly repeating. In an effort to develop an application that in the end recognize all sound events in a given context, this work describes an application of the SER in smart environments that aims at recognizing cough sounds. Such techniques cannot rely on the strict repeatability of sound events. They must move towards recognition of sound events that are rather similar to any one of a set of established models. The main working modes we examined were to model cough as non-speech utterances and to search for a match against a database of established models.","PeriodicalId":426074,"journal":{"name":"2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124172483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Achievements in the field of voice synthesis for Romanian 罗马尼亚语语音合成领域的成就
G. Toderean, O. Buza, J. Domokos
{"title":"Achievements in the field of voice synthesis for Romanian","authors":"G. Toderean, O. Buza, J. Domokos","doi":"10.1109/SPED.2015.7343078","DOIUrl":"https://doi.org/10.1109/SPED.2015.7343078","url":null,"abstract":"This article presents some of the voice synthesis methods designed and implemented at the research center of Technical University of Cluj-Napoca, methods that include: the phonemes-based and diphones-based LPC synthesis, the multipulse MPE synthesis, the NSM synthesis method, the RR_PSOLA variant of TD-PSOLA, a method based on syllables concatenation, and a corpus-based method. Also there are presented some voice synthesis systems that were realised: the ROMVOX system, SprintVox system, LIGHTVOX system and HTS system.","PeriodicalId":426074,"journal":{"name":"2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"53 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132870910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Methods for automatic generation of GRAALAN-based phonetic databases 基于graalan的语音数据库自动生成方法
S. Diaconescu, Monica-Mihaela Rizea, Felicia-Carmen Codirlasu, M. Ionescu, Monica Radulescu, A. Minca, Stefan Fulea
{"title":"Methods for automatic generation of GRAALAN-based phonetic databases","authors":"S. Diaconescu, Monica-Mihaela Rizea, Felicia-Carmen Codirlasu, M. Ionescu, Monica Radulescu, A. Minca, Stefan Fulea","doi":"10.1109/SPED.2015.7343082","DOIUrl":"https://doi.org/10.1109/SPED.2015.7343082","url":null,"abstract":"This paper presents methods for automatic generation of phonetic databases (The Morphological and Phonetic Dictionary, The Phonetic Dictionary of Syllables, The Rhyming Dictionary) for a natural language, starting from a set of linguistic knowledge bases. The knowledge bases are developed by means of the GRAALAN (Grammar Abstract Language) system. The exemplification of this process will be described by representing a Romanian Phonetic database.","PeriodicalId":426074,"journal":{"name":"2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133907701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信