2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)最新文献

筛选

英文中文

Audio Caption in a Car Setting with a Sentence-Level Loss 具有句子级丢失的汽车设置中的音频字幕

2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2019-05-01 DOI: 10.1109/ISCSLP49672.2021.9362117

Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu

引用次数: 11

Towards Realizing Sign Language to Emotional Speech Conversion by Deep Learning 用深度学习实现手语到情感语言的转换

2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2018-09-21 DOI: 10.1109/ISCSLP49672.2021.9362060

Nan Song, Hongwu Yang, Pengpeng Zhi

{"title":"Towards Realizing Sign Language to Emotional Speech Conversion by Deep Learning","authors":"Nan Song, Hongwu Yang, Pengpeng Zhi","doi":"10.1109/ISCSLP49672.2021.9362060","DOIUrl":"https://doi.org/10.1109/ISCSLP49672.2021.9362060","url":null,"abstract":"This paper proposes a framework of sign language to emotional speech conversion based on deep learning to solve communication disorders between people with language barriers and healthy people. We firstly trained a gesture recognition model and a facial expression recognition model by a deep convolutional generative adversarial network (DCGAN). Then we trained an emotional speech acoustic model with a hybrid long short-term memory (LSTM). We select the initials and the finals of Mandarin as the emotional speech synthesis units to train a speaker-independent average voice model (AVM). The speaker adaptation is applied to train a speaker-dependent hybrid LST-M model with one target speaker emotional corpus from AVM. Finally, we combine the gesture recognition model and facial expression recognition model with the emotional speech synthesis model to realize the sign language to emotional speech conversion. The experiments show that the recognition rate of gesture recognition is 93.96%, and the recognition rate of facial expression recognition in the CK+ database is 96.01%. The converted emotional speech not only has high quality but also can accurately express the facial expression.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126064079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

首页上一页