2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献_第4页

Voichap: A standalone real-time voice change application on iOS platform Voichap: iOS平台上的独立实时语音转换应用

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2017-12-01 DOI: 10.1109/APSIPA.2017.8282129

Xiaoling Wu, Shuhua Gao, Dong Huang, Cheng Xiang

{"title":"Voichap: A standalone real-time voice change application on iOS platform","authors":"Xiaoling Wu, Shuhua Gao, Dong Huang, Cheng Xiang","doi":"10.1109/APSIPA.2017.8282129","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282129","url":null,"abstract":"High-quality voice mimicry is appealing to everyone. However, only few vocal geniuses are endowed with the talent for vivid mimicry. Professional mimics have to be trained and practice over many years for various vocal skills, such as vocal control, precision in pitch, sense of rhythm and personal style, etc. To help achieve our dream for fascinating voice mimicry, such as speaking in a celebrity's voice, we have developed a real-time voice conversion technology for the general users. You can specify any target (like your friend or a celebrity) for your voice conversion as long as the target's training utterances are available. To facilitate easy use, we have implemented it efficiently as a mobile application on the iOS platform, called Voichap, which can generate a desired natural target voice. Notably, the complete training and conversion process is performed locally in a reasonable time, with no need for on-line server service, to improve the user experience. Just three steps are enough to use this application: choose a target, record your voice and then have fun listening to your converted voice.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"203 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132318018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data augmentation and feature extraction using variational autoencoder for acoustic modeling 基于变分自编码器的声学建模数据增强与特征提取

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2017-12-01 DOI: 10.1109/APSIPA.2017.8282225

H. Nishizaki

引用次数: 38

Emotion recognition by combining prosody and sentiment analysis for expressing reactive emotion by humanoid robot 基于韵律分析和情感分析相结合的仿人机器人表达反应性情感的情感识别

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2017-12-01 DOI: 10.1109/APSIPA.2017.8282243

Yuanchao Li, C. Ishi, Nigel G. Ward, K. Inoue, Shizuka Nakamura, K. Takanashi, Tatsuya Kawahara

引用次数: 15

Automatic vehicle classification using center strengthened convolutional neural network 基于中心增强卷积神经网络的车辆自动分类

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2017-12-01 DOI: 10.1109/APSIPA.2017.8282187

Kuan-Chung Wang, Yoga Dwi Pranata, Jia-Ching Wang

引用次数: 8

The acoustic characteristics of tone 3 in standard chinese produced by prelingually deaf adults 前语失聪成人标准汉语声调3的声学特征

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2017-12-01 DOI: 10.1109/APSIPA.2017.8282105

Yu Chen, Jie Hou, Yutong Xing, Yanting Chen, Hua Lin, J. Dang

引用次数: 0

Compressed high dimensional features for speaker spoofing detection 扬声器欺骗检测的压缩高维特征

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2017-12-01 DOI: 10.1109/APSIPA.2017.8282108

Yuanjun Zhao, R. Togneri, V. Sreeram

{"title":"Compressed high dimensional features for speaker spoofing detection","authors":"Yuanjun Zhao, R. Togneri, V. Sreeram","doi":"10.1109/APSIPA.2017.8282108","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282108","url":null,"abstract":"The vulnerability in Automatic Speaker Verification (ASV) systems to spoofing attacks such as speech synthesis (SS) and voice conversion (VC) has been recently proved. High- dimensional magnitude and phase based features possess outstanding spoofing detection performance but are not compatible with the Gaussian Mixture Model (GMM) classifiers which are commonly deployed in speaker recognition systems. In this paper, a Compressed Sensing (CS) framework is initially combined with high-dimensional (HD) features and a derived CS-HD based feature is proposed. A standalone spoofing detector assembled with the GMM classifier is evaluated on the ASVspoof 2015 database. Two ASV systems integrated with the spoofing detector are also tested. For the separate detector, an equal error rate (EER) of 0.01% and 5.35% are reached on the evaluation set for known attack and unknown attack, respectively. While for the ASV systems, the best EERs of 0.02% and 5.26% are achieved. The proposed CS-HD feature can obtain similar results with lower dimension than other systems. This suggests that the verification system can be made more computationally efficient.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133081358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Lung sound classification based on Hilbert-Huang transform features and multilayer perceptron network 基于Hilbert-Huang变换特征和多层感知器网络的肺声分类

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2017-12-01 DOI: 10.1109/APSIPA.2017.8282137

Yunxia Liu, Yang Yang, Yuehui Chen

引用次数: 6

Fuzzy qualitative approach for micro-expression recognition 微表情识别的模糊定性方法

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2017-12-01 DOI: 10.1109/APSIPA.2017.8282300

C. H. Lim, Kam Meng Goh

引用次数: 9

A fast and energy efficient FPGA-based system for real-time object tracking 基于fpga的快速、节能的实时目标跟踪系统

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2017-12-01 DOI: 10.1109/APSIPA.2017.8282162

Xiaobai Chen, Jinlong Xu, Zhiyi Yu

引用次数: 8

End-to-end speech recognition for languages with ideographic characters 具有表意字符的语言的端到端语音识别

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2017-12-01 DOI: 10.1109/APSIPA.2017.8282226

Hitoshi Ito, Aiko Hagiwara, Manon Ichiki, T. Mishima, Shoei Sato, A. Kobayashi

{"title":"End-to-end speech recognition for languages with ideographic characters","authors":"Hitoshi Ito, Aiko Hagiwara, Manon Ichiki, T. Mishima, Shoei Sato, A. Kobayashi","doi":"10.1109/APSIPA.2017.8282226","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282226","url":null,"abstract":"This paper describes a novel training method for acoustic models using connectionist temporal classification (CTC) for Japanese end-to-end automatic speech recognition (ASR). End-to-end ASR can estimate characters directly without using a pronunciation dictionary; however, this approach was conducted mostly in the English research area. When dealing with languages such as Japanese, we confront difficulties with robust acoustic modeling. One of the issues is caused by a large number of characters, including Japanese kanji, which leads to an increase in the number of model parameters. Additionally, multiple pronunciations of kanji increase the variance of acoustic features for corresponding characters. Therefore, we propose end-to-end ASR based on bi-directional long short-term memory (BLSTM) networks to solve these problems. Our proposal involves two approaches: reducing the number of dimensions of BLSTM and adding character strings to output layer labels. Dimensional compression decreases the number of parameters, while output label expansion reduces the variance of acoustic features. Consequently, we could obtain a robust model with a small number of parameters. Our experimental results with Japanese broadcast programs show the combined method of these two approaches improved the word error rate significantly compared with the conventional character-based end-to-end approach.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117254434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8