2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)最新文献

筛选
英文 中文
A rule-based approach to generating large phonetic databases for Romanian results of the AFLR project 为AFLR项目的罗马尼亚语结果生成大型语音数据库的基于规则的方法
S. Diaconescu, Monica-Mihaela Rizea, M. Ionescu, A. Minca, Monica Radulescu
{"title":"A rule-based approach to generating large phonetic databases for Romanian results of the AFLR project","authors":"S. Diaconescu, Monica-Mihaela Rizea, M. Ionescu, A. Minca, Monica Radulescu","doi":"10.1109/SPED.2017.7990439","DOIUrl":"https://doi.org/10.1109/SPED.2017.7990439","url":null,"abstract":"This paper presents a rule-based approach for generating a large phonetic database for Romanian. The knowledge base is developed by means of the GRAALAN (Grammar Abstract Language) system. By inspecting dictionaries and corpora, we generate a phonetic database over 100,000 lemmas. Our database has a high degree of accuracy ensured by our rule-based method applied for generating phonetic transcriptions.","PeriodicalId":345314,"journal":{"name":"2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122186675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Natural language processing model compiling natural language into byte code 将自然语言编译成字节码的自然语言处理模型
A. Trifan, Marilena Anghelus, R. Constantinescu
{"title":"Natural language processing model compiling natural language into byte code","authors":"A. Trifan, Marilena Anghelus, R. Constantinescu","doi":"10.1109/SPED.2017.7990434","DOIUrl":"https://doi.org/10.1109/SPED.2017.7990434","url":null,"abstract":"The need of progress implies the need of time. Daily tasks have been automated to solve time issues but they still need the input of a user. The need for interaction with different applications may endanger the user's life. The simplest way for these automatizations to be “life-saving” is to fully support speech recognition. Although, right now, this is done in an acceptable manner, the main problem resides in the language processing model itself. Without a good language processing model, there is no “learning” and no “progress”. This document is a technical proposal of a different approach regarding the processing of human languages and compiling it in a computer understandable form — byte code. The paper will treat the requirements needed for this to happen in the programming language known as Java, but the principles should be the same for any or all programming languages.","PeriodicalId":345314,"journal":{"name":"2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116816770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Word associations in media posts related to disasters — A statistical analysis 与灾害相关的媒体文章中的词汇关联——一项统计分析
M. Pirnau
{"title":"Word associations in media posts related to disasters — A statistical analysis","authors":"M. Pirnau","doi":"10.1109/SPED.2017.7990427","DOIUrl":"https://doi.org/10.1109/SPED.2017.7990427","url":null,"abstract":"The paper aims to analyze the frequency of the posts in case of earthquakes and of the word associations included in such Social Media (SM) posts. Since important posts are shared by users in SM, the purpose was to identify the variation of a number of posts having unique content that occurred over a period of time in Social Media for a particular topic. The present study uses messages generated by the Twitter platform, which had been posted before and after the occurrence of the earthquakes in the areas with important seismic activity, such as Vrancea (24th September 2016), Ussita (30th October 2016), New Zealand (13th November 2016) and Papua (23rd January 2017). For the analysis of the contents of the tweets, the A-priori algorithm was used to extract words associations from these posts, keywords that draw attention to the analyzed earthquake situation.","PeriodicalId":345314,"journal":{"name":"2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121017619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Automatic speaker analysis 2.0: Hearing the bigger picture 自动扬声器分析2.0:听到更大的画面
Björn Schuller
{"title":"Automatic speaker analysis 2.0: Hearing the bigger picture","authors":"Björn Schuller","doi":"10.1109/SPED.2017.7990449","DOIUrl":"https://doi.org/10.1109/SPED.2017.7990449","url":null,"abstract":"Automatic Speaker Analysis has largely focused on single aspects of a speaker such as her ID, gender, emotion, personality, or health state. This broadly ignores the interdependency of all the different states and traits impacting on the one single voice production mechanism available to a human speaker. In other words, sometimes we may sound depressed, but we simply have a flu, and hardly find the energy to put more vocal effort into our articulation and sound production. Recently, this lack gave rise to an increasingly holistic speaker analysis — assessing the ‘larger picture’ in one pass such as by multi-target learning. However, for a robust assessment, this requires large amount of speech and language resources labelled in rich ways to train such interdependency, and architectures able to cope with multi-target learning of massive amounts of speech data. In this light, this contribution will discuss efficient mechanisms such as large socialmedia pre-scanning with dynamic cooperative crowd-sourcing for rapid data collection, cross-task-labelling of these data in a wider range of attributes to reach ‘big & rich’ speech data, and efficient multi-target end-to-end and end-to-evolution deep learning paradigms to learn an accordingly rich representation of diverse target tasks in efficient ways. The ultimate goal behind is to enable machines to hear the ‘entire’ person and her condition and whereabouts behind the voice and words — rather than aiming at a single aspect blind to the overall individual and its state, thus leading to the next level of Automatic Speaker Analysis.","PeriodicalId":345314,"journal":{"name":"2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122696069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MaRePhoR — An open access machine-readable phonetic dictionary for Romanian MaRePhoR -一个开放存取的机器可读罗马尼亚语语音字典
Stefan-Adrian Toma, Adriana Stan, Mihai-Lica Pura, Traian Barsan
{"title":"MaRePhoR — An open access machine-readable phonetic dictionary for Romanian","authors":"Stefan-Adrian Toma, Adriana Stan, Mihai-Lica Pura, Traian Barsan","doi":"10.1109/SPED.2017.7990435","DOIUrl":"https://doi.org/10.1109/SPED.2017.7990435","url":null,"abstract":"This paper introduces a novel open access resource, the machine-readable phonetic dictionary for Romanian — MaRePhoR. It contains over 70,000 word entries, and their manually performed phonetic transcription. The paper describes the dictionary format and statistics, as well as an initial use of the phonetic transcription entries by building a grapheme to phoneme converter based on decision trees. Various training strategies were tested enabling the correct selection of a final setup for our predictor. The best results showed that using the dictionary as training data, an accuracy of over 99% can be achieved.","PeriodicalId":345314,"journal":{"name":"2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"420 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116687462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
SpeeD's DNN approach to Romanian speech recognition SpeeD对罗马尼亚语语音识别的DNN方法
Alexandru-Lucian Georgescu, H. Cucu, C. Burileanu
{"title":"SpeeD's DNN approach to Romanian speech recognition","authors":"Alexandru-Lucian Georgescu, H. Cucu, C. Burileanu","doi":"10.1109/SPED.2017.7990443","DOIUrl":"https://doi.org/10.1109/SPED.2017.7990443","url":null,"abstract":"This paper presents the main improvements brought recently to the large-vocabulary, continuous speech recognition (LVCSR) system for Romanian language developed by the Speech and Dialogue (SpeeD) research laboratory. While the most important improvement consists in the use of DNN-based acoustic models, instead of the classic HMM-GMM approach, several other aspects are discussed in the paper: a significant increase of the speech training corpus, the use of additional algorithms for feature processing, speaker adaptive training, and discriminative training and, finally, the use of lattice rescoring with significantly expanded language models (n-gram models up to order 5, based on vocabularies of up to 200k words). The ASR experiments were performed with several types of acoustic and language models in different configurations on the standard read and conversational speech corpora created by SpeeD in 2014. The results show that the extension of the training speech corpus leads to a relative word error rate (WER) improvement between 15% and 17%, while the use of DNN-based acoustic models instead of HMM-GMM-based acoustic models leads to a relative WER improvement between 18% and 23%, depending on the nature of the evaluation speech corpus (read or conversational, clean or noisy). The best configuration of the LVCSR system was integrated as a live transcription web application available online on SpeeD laboratory's website at https://speed.pub.ro/live-transcriber-2017.","PeriodicalId":345314,"journal":{"name":"2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133081669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Audio signal classification using Linear Predictive Coding and Random Forests 基于线性预测编码和随机森林的音频信号分类
L. Grama, C. Rusu
{"title":"Audio signal classification using Linear Predictive Coding and Random Forests","authors":"L. Grama, C. Rusu","doi":"10.1109/SPED.2017.7990431","DOIUrl":"https://doi.org/10.1109/SPED.2017.7990431","url":null,"abstract":"The goal of this work is to present an audio signal classification system based on Linear Predictive Coding and Random Forests. We consider the problem of multiclass classification with imbalanced datasets. The signals under classification belong to the class of sounds from wildlife intruder detection applications: birds, gunshots, chainsaws, human voice and tractors. The proposed system achieves an overall correct classification rate of 99.25%. There is no probability of false alarms in the case of birds or human voices. For the other three classes the probability is low, around 0.3%. The false omission rate is also low: around 0.2% for birds and tractors, a little bit higher for chainsaws (0.4%), lower for gunshots (0.14%) and zero for human voices.","PeriodicalId":345314,"journal":{"name":"2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116424996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Investigation on the performances of APA in forensic noise reduction APA在法庭降噪中的性能研究
R. Dobre, C. Paleologu, S. Ciochină, C. Negrescu, D. Stanomir
{"title":"Investigation on the performances of APA in forensic noise reduction","authors":"R. Dobre, C. Paleologu, S. Ciochină, C. Negrescu, D. Stanomir","doi":"10.1109/SPED.2017.7990442","DOIUrl":"https://doi.org/10.1109/SPED.2017.7990442","url":null,"abstract":"Multimedia files, either video or audio, could greatly influence the final verdict of a trial when accepted as evidence. The abundance of free editing software available nowadays make forgeries a very easy operation. Audio messages, even if authentic, in some cases, can be heavily masked by other signals and declared unusable. This paper presents the investigations on the performance of the affine projection algorithm (APA) in recovering a speech signal drowned in a loud musical signal and it represents a contribution to the multimedia forensic domain. The APA was tested in multiple situations showing the top performance limits and how the working parameters influence those limits.","PeriodicalId":345314,"journal":{"name":"2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122136949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Voice-related symptom and knowledge-bases using internet mining 基于网络挖掘的语音相关症状和知识库
H. Teodorescu, D. Gogalniceanu
{"title":"Voice-related symptom and knowledge-bases using internet mining","authors":"H. Teodorescu, D. Gogalniceanu","doi":"10.1109/SPED.2017.7990426","DOIUrl":"https://doi.org/10.1109/SPED.2017.7990426","url":null,"abstract":"We report the first development of a set of symptoms for a medical condition where the set of symptoms is based exclusively on information collected on the Internet. Also, we lay down a general method for doing so. Third, we introduce the first systematic set of symptoms for temporo-mandibular disorder (TMD) exclusively related to speech and suggest a set of known quantitative parameters for the analysis of these symptoms.","PeriodicalId":345314,"journal":{"name":"2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129559370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Building a representative audio base of syllables for Romanian language 建立具有代表性的罗马尼亚语音节音频库
S. Diaconescu, Monica-Mihaela Rizea, M. Ionescu, A. Minca, Liviu Dorobantu, Stefan Fulea, Monica Radulescu, H. Cucu, D. Burileanu
{"title":"Building a representative audio base of syllables for Romanian language","authors":"S. Diaconescu, Monica-Mihaela Rizea, M. Ionescu, A. Minca, Liviu Dorobantu, Stefan Fulea, Monica Radulescu, H. Cucu, D. Burileanu","doi":"10.1109/SPED.2017.7990444","DOIUrl":"https://doi.org/10.1109/SPED.2017.7990444","url":null,"abstract":"The aim of this work is to provide some insights regarding the effort of building a representative and wide coverage audio base of syllables for Romanian. The audio base comprises audio recordings of syllables extracted from the following types of syllable embedding: isolated-syllable, isolated-word and continuous speech. The list of syllables has been computed over the syllabified form of single-word inflected forms. The inflected forms were generated using a general rule-based system for normal and phonetic inflection having at its core the GRAALAN (GRAmmar Abstract LANguage) metalanguage (designed for linguistic knowledge description). In addition, the word-position of a syllable was accounted for when planning the audio recordings.","PeriodicalId":345314,"journal":{"name":"2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117123321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信