2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

筛选
英文 中文
Voichap: A standalone real-time voice change application on iOS platform Voichap: iOS平台上的独立实时语音转换应用
Xiaoling Wu, Shuhua Gao, Dong Huang, Cheng Xiang
{"title":"Voichap: A standalone real-time voice change application on iOS platform","authors":"Xiaoling Wu, Shuhua Gao, Dong Huang, Cheng Xiang","doi":"10.1109/APSIPA.2017.8282129","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282129","url":null,"abstract":"High-quality voice mimicry is appealing to everyone. However, only few vocal geniuses are endowed with the talent for vivid mimicry. Professional mimics have to be trained and practice over many years for various vocal skills, such as vocal control, precision in pitch, sense of rhythm and personal style, etc. To help achieve our dream for fascinating voice mimicry, such as speaking in a celebrity's voice, we have developed a real-time voice conversion technology for the general users. You can specify any target (like your friend or a celebrity) for your voice conversion as long as the target's training utterances are available. To facilitate easy use, we have implemented it efficiently as a mobile application on the iOS platform, called Voichap, which can generate a desired natural target voice. Notably, the complete training and conversion process is performed locally in a reasonable time, with no need for on-line server service, to improve the user experience. Just three steps are enough to use this application: choose a target, record your voice and then have fun listening to your converted voice.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"203 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132318018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data augmentation and feature extraction using variational autoencoder for acoustic modeling 基于变分自编码器的声学建模数据增强与特征提取
H. Nishizaki
{"title":"Data augmentation and feature extraction using variational autoencoder for acoustic modeling","authors":"H. Nishizaki","doi":"10.1109/APSIPA.2017.8282225","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282225","url":null,"abstract":"A data augmentation and feature extraction method using a variational autoencoder (VAE) for acoustic modeling is described. A VAE is a generative model based on variational Bayesian learning using a deep learning framework. A VAE can extract latent values its input variables to generate new information. VAEs are widely used to generate pictures and sentences. In this paper, a VAE is applied to speech corpus data augmentation and feature vector extraction from speech for acoustic modeling. First, the size of a speech corpus is doubled by encoding latent variables extracted from original utterances using a VAE framework. The latent variables extracted from speech waveforms have latent \"meanings\" of the waveforms. Therefore, latent variables can be used as acoustic features for automatic speech recognition (ASR). This paper experimentally shows the effectiveness of data augmentation using a VAE framework and that latent variable-based features can be utilized in ASR.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132382884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Emotion recognition by combining prosody and sentiment analysis for expressing reactive emotion by humanoid robot 基于韵律分析和情感分析相结合的仿人机器人表达反应性情感的情感识别
Yuanchao Li, C. Ishi, Nigel G. Ward, K. Inoue, Shizuka Nakamura, K. Takanashi, Tatsuya Kawahara
{"title":"Emotion recognition by combining prosody and sentiment analysis for expressing reactive emotion by humanoid robot","authors":"Yuanchao Li, C. Ishi, Nigel G. Ward, K. Inoue, Shizuka Nakamura, K. Takanashi, Tatsuya Kawahara","doi":"10.1109/APSIPA.2017.8282243","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282243","url":null,"abstract":"In order to achieve rapport in human-robot interaction, it is important to express a reactive emotion that matches with the user's mental state. This paper addresses an emotion recognition method which combines prosody and sentiment analysis for the system to properly express reactive emotion. In the user emotion recognition module, valence estimation from prosodic features is combined with sentiment analysis of text information. Combining the two information sources significantly improved the valence estimation accuracy. In the reactive emotion expression module, the system's emotion category and level are predicted using the parameters estimated in the recognition module, based on distributions inferred from human-human dialog data. Subjective evaluation results show that the proposed method is effective for expressing human-like reactive emotion.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131062947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Automatic vehicle classification using center strengthened convolutional neural network 基于中心增强卷积神经网络的车辆自动分类
Kuan-Chung Wang, Yoga Dwi Pranata, Jia-Ching Wang
{"title":"Automatic vehicle classification using center strengthened convolutional neural network","authors":"Kuan-Chung Wang, Yoga Dwi Pranata, Jia-Ching Wang","doi":"10.1109/APSIPA.2017.8282187","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282187","url":null,"abstract":"Vehicle classification is one of the major part for the smart road management system and traffic management system. The use of appropriate algorithms has a significant impact in the process of classification. In this paper, we propose a deep neural network, named center strengthened convolutional neural network (CS- CNN), for handling central part image feature enhancement with non-fixed size input. The main hallmark of this proposed architecture is center enhancement that extract additional feature from central of image by ROI pooling. Another, our CS-CNN, based on VGG network architecture, joint with ROI pooling layer to get elaborate feature maps. Our proposed method will be compared with other typical deep learning architecture like VGG-s and VGG-Verydeep-16. In the experiments, we show the outstanding performance which getting more than 97% accuracy on vehicle classification with only few training data from Caltech256 datasets.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132911473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
The acoustic characteristics of tone 3 in standard chinese produced by prelingually deaf adults 前语失聪成人标准汉语声调3的声学特征
Yu Chen, Jie Hou, Yutong Xing, Yanting Chen, Hua Lin, J. Dang
{"title":"The acoustic characteristics of tone 3 in standard chinese produced by prelingually deaf adults","authors":"Yu Chen, Jie Hou, Yutong Xing, Yanting Chen, Hua Lin, J. Dang","doi":"10.1109/APSIPA.2017.8282105","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282105","url":null,"abstract":"This paper studies the acoustic characteristics of Tone 3 produced by prelingually deaf adults and finds that the deaf females and males apply different strategies to realize this dipping-rising tone: for deaf females, they tend to use creaky voice in producing this tone; for deaf males, they adopt a longer duration and a slower turning to distinguish T3 from other tones in Standard Chinese. Moreover, results of this study support the viewpoint that the prelingually deaf adults could benefit from their longer experience of cochlear implant to improve their capability of tone's production.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133054959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compressed high dimensional features for speaker spoofing detection 扬声器欺骗检测的压缩高维特征
Yuanjun Zhao, R. Togneri, V. Sreeram
{"title":"Compressed high dimensional features for speaker spoofing detection","authors":"Yuanjun Zhao, R. Togneri, V. Sreeram","doi":"10.1109/APSIPA.2017.8282108","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282108","url":null,"abstract":"The vulnerability in Automatic Speaker Verification (ASV) systems to spoofing attacks such as speech synthesis (SS) and voice conversion (VC) has been recently proved. High- dimensional magnitude and phase based features possess outstanding spoofing detection performance but are not compatible with the Gaussian Mixture Model (GMM) classifiers which are commonly deployed in speaker recognition systems. In this paper, a Compressed Sensing (CS) framework is initially combined with high-dimensional (HD) features and a derived CS-HD based feature is proposed. A standalone spoofing detector assembled with the GMM classifier is evaluated on the ASVspoof 2015 database. Two ASV systems integrated with the spoofing detector are also tested. For the separate detector, an equal error rate (EER) of 0.01% and 5.35% are reached on the evaluation set for known attack and unknown attack, respectively. While for the ASV systems, the best EERs of 0.02% and 5.26% are achieved. The proposed CS-HD feature can obtain similar results with lower dimension than other systems. This suggests that the verification system can be made more computationally efficient.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133081358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Lung sound classification based on Hilbert-Huang transform features and multilayer perceptron network 基于Hilbert-Huang变换特征和多层感知器网络的肺声分类
Yunxia Liu, Yang Yang, Yuehui Chen
{"title":"Lung sound classification based on Hilbert-Huang transform features and multilayer perceptron network","authors":"Yunxia Liu, Yang Yang, Yuehui Chen","doi":"10.1109/APSIPA.2017.8282137","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282137","url":null,"abstract":"Accurate classification of lung sounds plays an important role in noninvasive diagnosis of pulmonary diseases. A novel lung sound classification algorithm based on Hilbert-Huang transform (HHT) features and multilayer perceptron network is proposed in this paper. Three types of HHT domain features, namely the instantaneous envelope amplitude of intrinsic mode functions (IMF), envelop of instantaneous amplitude of the first four layers IMFs, and max value of the marginal spectrum are proposed for jointly characterization of the time-frequency properties of lung sounds. These proposed features are feed into a multi-layer perceptron neural network for training and testing of lung sound signal classification. Abundant experimental work is carried out to verify the effectiveness of the proposed algorithm.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132728083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Fuzzy qualitative approach for micro-expression recognition 微表情识别的模糊定性方法
C. H. Lim, Kam Meng Goh
{"title":"Fuzzy qualitative approach for micro-expression recognition","authors":"C. H. Lim, Kam Meng Goh","doi":"10.1109/APSIPA.2017.8282300","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282300","url":null,"abstract":"Micro-expression recognition has received increasing attention in the field of computer vision nowadays. Many state-of-the-art approaches have been reported but it can be seen that most of the results are capped at a certain level of accuracy. This is due to the ambiguity that abounded during the extraction of extremely short period of facial movements. These ambiguities deteriorate the performance of the overall recognition rate if using crisp classifier. This paper proposed to study the micro-expression as a non-mutual exclusive classification problem and examine the effectiveness of multi-label classification in micro-expression recognition by using the Fuzzy Qualitative Rank Classifier (FQRC). In addition, the extension of FQRC with feature selection and part-based model is proposed which shows promising results after tested on CASME II dataset.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131321012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A fast and energy efficient FPGA-based system for real-time object tracking 基于fpga的快速、节能的实时目标跟踪系统
Xiaobai Chen, Jinlong Xu, Zhiyi Yu
{"title":"A fast and energy efficient FPGA-based system for real-time object tracking","authors":"Xiaobai Chen, Jinlong Xu, Zhiyi Yu","doi":"10.1109/APSIPA.2017.8282162","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282162","url":null,"abstract":"Visual object tracking has achieved great advances in the past decades and has been widely applied in vision-based applications. Due to the popularization of the power-sensitive mobile platform, robust and low power real-time tracking solution is strongly required. An energy efficient real-time object tracking system on both static and moving camera is proposed in this paper. The system reduces the computational cost and explores data reuse by optimizing the tracking algorithm, the data flow, and the parallelism strategies. The architecture is implemented on a Xilinx ZC706 FPGA, and the experimental data shows that the system obtains 41 frame/s throughput for the 640×480 video and achieves higher energy efficiency comparing to other similar works.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114820305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
End-to-end speech recognition for languages with ideographic characters 具有表意字符的语言的端到端语音识别
Hitoshi Ito, Aiko Hagiwara, Manon Ichiki, T. Mishima, Shoei Sato, A. Kobayashi
{"title":"End-to-end speech recognition for languages with ideographic characters","authors":"Hitoshi Ito, Aiko Hagiwara, Manon Ichiki, T. Mishima, Shoei Sato, A. Kobayashi","doi":"10.1109/APSIPA.2017.8282226","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282226","url":null,"abstract":"This paper describes a novel training method for acoustic models using connectionist temporal classification (CTC) for Japanese end-to-end automatic speech recognition (ASR). End-to-end ASR can estimate characters directly without using a pronunciation dictionary; however, this approach was conducted mostly in the English research area. When dealing with languages such as Japanese, we confront difficulties with robust acoustic modeling. One of the issues is caused by a large number of characters, including Japanese kanji, which leads to an increase in the number of model parameters. Additionally, multiple pronunciations of kanji increase the variance of acoustic features for corresponding characters. Therefore, we propose end-to-end ASR based on bi-directional long short-term memory (BLSTM) networks to solve these problems. Our proposal involves two approaches: reducing the number of dimensions of BLSTM and adding character strings to output layer labels. Dimensional compression decreases the number of parameters, while output label expansion reduces the variance of acoustic features. Consequently, we could obtain a robust model with a small number of parameters. Our experimental results with Japanese broadcast programs show the combined method of these two approaches improved the word error rate significantly compared with the conventional character-based end-to-end approach.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117254434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信