2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)最新文献

筛选
英文 中文
Multi-resolution spectral input for convolutional neural network-based speech recognition 基于卷积神经网络的多分辨率频谱输入语音识别
L. Tóth
{"title":"Multi-resolution spectral input for convolutional neural network-based speech recognition","authors":"L. Tóth","doi":"10.1109/SPED.2017.7990430","DOIUrl":"https://doi.org/10.1109/SPED.2017.7990430","url":null,"abstract":"The convolutional deep neural network component applied frequently in current speech recognizers is trained on a context of consecutive spectral feature vectors. Here, we investigate whether we can extend the time span of this input and reduce the number of spectral features at the same time by using a multi-resolution spectrum as input. In the proposed multi-resolution scheme, the network processes the nearby neighbors of the actual frame using the standard resolution, while it applies a gradually coarser resolution for more distant frames. Using this solution, we managed to extend the input of our network to a time context of 45 frames without increasing the number of input features, and we also achieved a relative error rate reduction of 3–4% compared to the conventional high-resolution representation. We report a phone error rate of 17.0% on the TIMIT core test set, which is competitive with the best scores published on this data set.","PeriodicalId":345314,"journal":{"name":"2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115043372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Towards a continuous speech corpus for banking domain automatic speech recognition 面向银行领域语音自动识别的连续语料库
G. Suciu, Stefan-Adrian Toma, Romulus Cheveresan
{"title":"Towards a continuous speech corpus for banking domain automatic speech recognition","authors":"G. Suciu, Stefan-Adrian Toma, Romulus Cheveresan","doi":"10.1109/SPED.2017.7990436","DOIUrl":"https://doi.org/10.1109/SPED.2017.7990436","url":null,"abstract":"This paper presents the work done towards developing a speech corpus for Romanian, for automatic speech recognition for the banking domain. This work is done in the context of the Speech2Process project, which aims at creating a system which allows interaction between customers and agents in the contact center much easier. The application to use the banking corpus will provide automatic response to client requests, received through voice communication protocols, in costumer support services.","PeriodicalId":345314,"journal":{"name":"2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129048695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Fast method for ENF database build and search ENF数据库的快速构建和搜索方法
Gheorghe Pop, Dragos Draghicescu, D. Burileanu, H. Cucu, C. Burileanu
{"title":"Fast method for ENF database build and search","authors":"Gheorghe Pop, Dragos Draghicescu, D. Burileanu, H. Cucu, C. Burileanu","doi":"10.1109/SPED.2017.7990447","DOIUrl":"https://doi.org/10.1109/SPED.2017.7990447","url":null,"abstract":"The field of digital audio forensics has been driving a sustained research effort in the last decade. Current digital audio authentication frameworks include Electric Network Frequency (ENF) criterion as a must. The ENF-based techniques benefit greatly from the availability of reference databases, which are built using extraction mechanisms that continuously analyze the power line signal. To find the recording time of an ENF-carrying audio, the frequency sequence extracted from the file is matched against a reference database. A database collection method based on spectral analysis needs to trade time resolution for frequency resolution. This tradeoff usually leads to databases with less variance in the frequency series. In this paper we present a method to efficiently build an ENF database, with good time resolution, reduced storage requirements, and a fast two-step search procedure.","PeriodicalId":345314,"journal":{"name":"2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"1992 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128606928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Semantics driven intelligent front-end 语义驱动的智能前端
T. Gergely, Edit Halmay, Miklós Szöts, G. Suciu, Romulus Cheveresan
{"title":"Semantics driven intelligent front-end","authors":"T. Gergely, Edit Halmay, Miklós Szöts, G. Suciu, Romulus Cheveresan","doi":"10.1109/SPED.2017.7990429","DOIUrl":"https://doi.org/10.1109/SPED.2017.7990429","url":null,"abstract":"This paper presents the work done in the context of the Speech2Process project for Speech Dialogue System applied in call-centers, specifically in the banking domain. In our proposed solution, the client communicates with the system by natural language sentences, which will be automatically recognized and semantically analysed. The paper describes innovative features of the selected approach, which is based on knowledge representation, and concentrates on semantic processing, that is spoken language understanding (SLU) based on a cognitive semantics, namely Fillmore's frame semantics.","PeriodicalId":345314,"journal":{"name":"2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127004174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Speech recognition results for voice-controlled assistive applications 语音控制辅助应用程序的语音识别结果
Alexandru Caranica, H. Cucu, C. Burileanu, François Portet, Michel Vacher
{"title":"Speech recognition results for voice-controlled assistive applications","authors":"Alexandru Caranica, H. Cucu, C. Burileanu, François Portet, Michel Vacher","doi":"10.1109/SPED.2017.7990438","DOIUrl":"https://doi.org/10.1109/SPED.2017.7990438","url":null,"abstract":"Until recently, controlling a “smart home” consisted in setting up a series of applications and automation tools: scheduling when the air conditioning system could cool the room, turn on the lighting system at sunset, or just use ones phone to control several TV appliances or the garage door. Recent advances in speech recognition technology have made voice-controlled smart homes attainable, and many companies and communities are providing interfaces or home boxes to make this voice control available. However, they lack customization ability, and interoperability with appliances or applications is not guaranteed. Moreover, most of these systems are not focused in supporting specific voice recognition scenarios, such as assistive applications for elder or disabled people or consider a triggered close talking voice interaction. Although state of the art speech processing has achieved great performance for most widely used languages, little to no efforts were made for under-resourced languages, such as Romanian. This paper focuses on a set of experiments in building a series of acoustic and grammar models for Romanian language, to be used in distant speech recognition scenarios, for voice driven speech applications in intelligent homes or buildings, using previously acquired speech databases in Romanian language, in real life conditions, by our research group.","PeriodicalId":345314,"journal":{"name":"2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121660484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Several classifiers for intruder detection applications 用于入侵者检测应用程序的几个分类器
Elena Roxana Buhus, L. Grama, C. Rusu
{"title":"Several classifiers for intruder detection applications","authors":"Elena Roxana Buhus, L. Grama, C. Rusu","doi":"10.1109/SPED.2017.7990432","DOIUrl":"https://doi.org/10.1109/SPED.2017.7990432","url":null,"abstract":"The goal of this work is to present some possible intruder detection systems and the influence of impulse-like signals upon the overall classification accuracy. Two different scenarios are used: in the first scenario five sound classes are considered (last class belong to impulsive sounds — gunshots), while in the second scenario we dropped out the impulsive sound class. More classifiers are used in both scenarios and different number of features are considered. An improvement in the classification accuracy is obtained within the second scenario. The highest accuracy for the first scenario is for J48 classifier using 51 features, while for the second scenario the highest accuracy is attained for Simple Logistic classifier wit 101 features.","PeriodicalId":345314,"journal":{"name":"2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128304861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Influences of age in emotion recognition of spontaneous speech: A case of an under-resourced language 年龄对自发言语情绪识别的影响:以资源不足语言为例
N. Jamil, F. Apandi, Raseeda Hamzah
{"title":"Influences of age in emotion recognition of spontaneous speech: A case of an under-resourced language","authors":"N. Jamil, F. Apandi, Raseeda Hamzah","doi":"10.1109/SPED.2017.7990448","DOIUrl":"https://doi.org/10.1109/SPED.2017.7990448","url":null,"abstract":"Recognizing emotions using natural or spontaneous speech are extremely difficult compared to doing the same for acted or elicited speeches. Speech emotion recognition for real conversation such as spontaneous speech requires linguistic information of the speech to be included in the speech emotion recognition component to achieve a high recognition rate. However, with the lack of digital speech resources of an under-resourced language, this requirement poses a problem. In this paper, speech emotion recognition of spontaneous speech in Malay language using prosodic features and Random Forest classifier is presented. We also investigate the influence of age categorized as children, young adults and middle-aged on emotion recognition. Ninety spontaneous speech sentences from 30 native speakers of Malay language are collected and classified into three emotions, which are happy, angry and sad. Results show that the spontaneous speech of middle-aged group achieved the highest accuracy rate followed by children age group and finally the young adults. While sad emotions are recognized satisfactorily across all age groups, confusions exist between happy and angry emotions.","PeriodicalId":345314,"journal":{"name":"2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134274029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
The SWARA speech corpus: A large parallel Romanian read speech dataset SWARA语音语料库:一个大型并行罗马尼亚读语音数据集
Adriana Stan, Florina Dinescu, C. Tiple, S. Meza, B. Orza, M. Chirilă, M. Giurgiu
{"title":"The SWARA speech corpus: A large parallel Romanian read speech dataset","authors":"Adriana Stan, Florina Dinescu, C. Tiple, S. Meza, B. Orza, M. Chirilă, M. Giurgiu","doi":"10.1109/SPED.2017.7990428","DOIUrl":"https://doi.org/10.1109/SPED.2017.7990428","url":null,"abstract":"This paper introduces one of the largest Romanian speech datasets freely available for both academic and commercial use. The dataset comprises speech data recorded over the last year from 12 speakers, along with 5 other speakers previously recorded in a separate environment. The data was manually segmented at utterance-level and semi-automatically labelled at phone-level. The resulting corpus amounts to approximately 21 hours of high-quality read speech data, split into over 19,000 utterances. The speakers read between 921 and 1493 utterances each. 880 utterances are common to all speakers and add up to over 16 hours of parallel data. We present the steps of performing the recordings and data segmentation, as well as a first use of this corpus in the context of synthetic voice development.","PeriodicalId":345314,"journal":{"name":"2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115539783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Speech recognition in education: Voice geometry painter application 语音识别在教育:语音几何画家的应用
Lucian-Petru Tuca, Adrian Iftene
{"title":"Speech recognition in education: Voice geometry painter application","authors":"Lucian-Petru Tuca, Adrian Iftene","doi":"10.1109/SPED.2017.7990446","DOIUrl":"https://doi.org/10.1109/SPED.2017.7990446","url":null,"abstract":"Nowadays, we find ourselves in an era when the education is reforming and on the other side the technology is getting better, greater and more accessible than ever [1]. The Internet of Things is already altering health care, security, utilities, transportation, and household management. The devices themselves might be small, but they bring about major changes in how we live, work, and educate our society; we must plan for and question those changes [2]. Anticipating all these phenomena the project presented in this paper aims to provide a working prototype of an educational oriented app that will make teaching easier and learning more pleasant. This application will use speech recognition as one of its pillars and will address the basic geometry students.","PeriodicalId":345314,"journal":{"name":"2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114698730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Old geographical corpora: A methodology for interpretative transcription 古地理语料库:解释性抄写的方法论
Mihaela Plamada-Onofrei, Daniela Gîfu, Cecilia Bolea
{"title":"Old geographical corpora: A methodology for interpretative transcription","authors":"Mihaela Plamada-Onofrei, Daniela Gîfu, Cecilia Bolea","doi":"10.1109/SPED.2017.7990445","DOIUrl":"https://doi.org/10.1109/SPED.2017.7990445","url":null,"abstract":"This paper describes a study of the evolution of Romanian language, belonging to 18h and 19h centuries, from geographical domain, in order to develop an automatic recognition and interpretative transcription of Romanian historical heritage writings from Cyrillic into Latin, in printed forms. It is well known that the operation of interpretative transcription of texts written in Cyrillic is extremely laborious, but it will solve a problem of great interest to humanities researchers who are concerned with the study of the Romanian language in its diachronic evolution. We think that the present study will impact the humanities research, including that of paleography, history, archaeology and that field of linguistics interested in the study of the language in diachrony, but it will also help the researchers in the field of computational linguistics that develop models for old language, in order to develop a diachronic POS tagger, so necessary to recover old lemmata.","PeriodicalId":345314,"journal":{"name":"2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129826239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信