2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)最新文献_第2页

A rule-based approach to generating large phonetic databases for Romanian results of the AFLR project 为AFLR项目的罗马尼亚语结果生成大型语音数据库的基于规则的方法

2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) Pub Date : 2017-07-01 DOI: 10.1109/SPED.2017.7990439

S. Diaconescu, Monica-Mihaela Rizea, M. Ionescu, A. Minca, Monica Radulescu

引用次数: 0

Natural language processing model compiling natural language into byte code 将自然语言编译成字节码的自然语言处理模型

2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) Pub Date : 2017-07-01 DOI: 10.1109/SPED.2017.7990434

A. Trifan, Marilena Anghelus, R. Constantinescu

引用次数: 2

Word associations in media posts related to disasters — A statistical analysis 与灾害相关的媒体文章中的词汇关联——一项统计分析

2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) Pub Date : 2017-07-01 DOI: 10.1109/SPED.2017.7990427

M. Pirnau

引用次数: 2

Automatic speaker analysis 2.0: Hearing the bigger picture 自动扬声器分析2.0:听到更大的画面

2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) Pub Date : 2017-07-01 DOI: 10.1109/SPED.2017.7990449

Björn Schuller

{"title":"Automatic speaker analysis 2.0: Hearing the bigger picture","authors":"Björn Schuller","doi":"10.1109/SPED.2017.7990449","DOIUrl":"https://doi.org/10.1109/SPED.2017.7990449","url":null,"abstract":"Automatic Speaker Analysis has largely focused on single aspects of a speaker such as her ID, gender, emotion, personality, or health state. This broadly ignores the interdependency of all the different states and traits impacting on the one single voice production mechanism available to a human speaker. In other words, sometimes we may sound depressed, but we simply have a flu, and hardly find the energy to put more vocal effort into our articulation and sound production. Recently, this lack gave rise to an increasingly holistic speaker analysis — assessing the ‘larger picture’ in one pass such as by multi-target learning. However, for a robust assessment, this requires large amount of speech and language resources labelled in rich ways to train such interdependency, and architectures able to cope with multi-target learning of massive amounts of speech data. In this light, this contribution will discuss efficient mechanisms such as large socialmedia pre-scanning with dynamic cooperative crowd-sourcing for rapid data collection, cross-task-labelling of these data in a wider range of attributes to reach ‘big & rich’ speech data, and efficient multi-target end-to-end and end-to-evolution deep learning paradigms to learn an accordingly rich representation of diverse target tasks in efficient ways. The ultimate goal behind is to enable machines to hear the ‘entire’ person and her condition and whereabouts behind the voice and words — rather than aiming at a single aspect blind to the overall individual and its state, thus leading to the next level of Automatic Speaker Analysis.","PeriodicalId":345314,"journal":{"name":"2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122696069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MaRePhoR — An open access machine-readable phonetic dictionary for Romanian MaRePhoR -一个开放存取的机器可读罗马尼亚语语音字典

2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) Pub Date : 2017-07-01 DOI: 10.1109/SPED.2017.7990435

Stefan-Adrian Toma, Adriana Stan, Mihai-Lica Pura, Traian Barsan

引用次数: 5

SpeeD's DNN approach to Romanian speech recognition SpeeD对罗马尼亚语语音识别的DNN方法

2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) Pub Date : 2017-07-01 DOI: 10.1109/SPED.2017.7990443

Alexandru-Lucian Georgescu, H. Cucu, C. Burileanu

{"title":"SpeeD's DNN approach to Romanian speech recognition","authors":"Alexandru-Lucian Georgescu, H. Cucu, C. Burileanu","doi":"10.1109/SPED.2017.7990443","DOIUrl":"https://doi.org/10.1109/SPED.2017.7990443","url":null,"abstract":"This paper presents the main improvements brought recently to the large-vocabulary, continuous speech recognition (LVCSR) system for Romanian language developed by the Speech and Dialogue (SpeeD) research laboratory. While the most important improvement consists in the use of DNN-based acoustic models, instead of the classic HMM-GMM approach, several other aspects are discussed in the paper: a significant increase of the speech training corpus, the use of additional algorithms for feature processing, speaker adaptive training, and discriminative training and, finally, the use of lattice rescoring with significantly expanded language models (n-gram models up to order 5, based on vocabularies of up to 200k words). The ASR experiments were performed with several types of acoustic and language models in different configurations on the standard read and conversational speech corpora created by SpeeD in 2014. The results show that the extension of the training speech corpus leads to a relative word error rate (WER) improvement between 15% and 17%, while the use of DNN-based acoustic models instead of HMM-GMM-based acoustic models leads to a relative WER improvement between 18% and 23%, depending on the nature of the evaluation speech corpus (read or conversational, clean or noisy). The best configuration of the LVCSR system was integrated as a live transcription web application available online on SpeeD laboratory's website at https://speed.pub.ro/live-transcriber-2017.","PeriodicalId":345314,"journal":{"name":"2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133081669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Audio signal classification using Linear Predictive Coding and Random Forests 基于线性预测编码和随机森林的音频信号分类

2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) Pub Date : 2017-07-01 DOI: 10.1109/SPED.2017.7990431

L. Grama, C. Rusu

引用次数: 17

Investigation on the performances of APA in forensic noise reduction APA在法庭降噪中的性能研究

2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) Pub Date : 2017-07-01 DOI: 10.1109/SPED.2017.7990442

R. Dobre, C. Paleologu, S. Ciochină, C. Negrescu, D. Stanomir

引用次数: 4

Voice-related symptom and knowledge-bases using internet mining 基于网络挖掘的语音相关症状和知识库

2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) Pub Date : 2017-07-01 DOI: 10.1109/SPED.2017.7990426

H. Teodorescu, D. Gogalniceanu

引用次数: 2

Building a representative audio base of syllables for Romanian language 建立具有代表性的罗马尼亚语音节音频库

2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) Pub Date : 2017-07-01 DOI: 10.1109/SPED.2017.7990444

S. Diaconescu, Monica-Mihaela Rizea, M. Ionescu, A. Minca, Liviu Dorobantu, Stefan Fulea, Monica Radulescu, H. Cucu, D. Burileanu

引用次数: 0