2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

筛选
英文 中文
Image super-resolution based on error compensation with convolutional neural network 基于卷积神经网络误差补偿的图像超分辨率
Wei-Ting Lu, Chien-Wei Lin, Chih-Hung Kuo, Ying-Chan Tung
{"title":"Image super-resolution based on error compensation with convolutional neural network","authors":"Wei-Ting Lu, Chien-Wei Lin, Chih-Hung Kuo, Ying-Chan Tung","doi":"10.1109/APSIPA.2017.8282203","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282203","url":null,"abstract":"Convolutional Neural Networks have been widely studied for the super-resolution (SR) and other image restoration tasks. In this paper, we propose an additional error-compensational convolutional neural network (EC-CNN) that is trained based on the concept of iterative back projection (IBP). The residuals between interpolation images and ground truth images are used to train the network. This CNN model can compensate the residual projection in the IBP more accurately. This CNN- based IBP can be further combined with the super-resolution CNN(SRCNN). Experimental results show that our method can significantly enhance the quality of scale images as a post-processing method. The approach can averagely outperform SRCNN by 0.14 dB and SRCNN-EX by 0.08 dB in PSNR with scaling factor 3.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124972181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Importance of non-uniform prosody modification for speech recognition in emotion conditions 非均匀韵律修饰在情绪条件下语音识别中的重要性
Vishnu Vidyadhara Raju Vegesna, Hari Krishna Vydana, S. Gangashetty, A. Vuppala
{"title":"Importance of non-uniform prosody modification for speech recognition in emotion conditions","authors":"Vishnu Vidyadhara Raju Vegesna, Hari Krishna Vydana, S. Gangashetty, A. Vuppala","doi":"10.1109/APSIPA.2017.8282109","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282109","url":null,"abstract":"A mismatch in training and operating environments causes a performance degradation in speech recognition systems (ASR). One major reason for this mismatch is due to the presence of expressive (emotive) speech in operational environments. Emotions in speech majorly inflict the changes in the prosody parameters of pitch, duration and energy. This work is aimed at improving the performance of speech recognition systems in the presence of emotive speech. This work focuses on improving the speech recognition performance without disturbing the existing ASR system. The prosody modification of pitch, duration and energy is achieved by tuning the modification factors values for the relative differences between the neutral and emotional data sets. The neutral version of emotive speech is generated using uniform and non-uniform prosody modification methods for speech recognition. During the study, IITKGP-SESC corpus is used for building the ASR system. The speech recognition system for the emotions (anger, happy and compassion) is evaluated. An improvement in the performance of ASR is observed when the prosody modified emotive utterance is used for speech recognition in place of original emotive utterance. An average improvement around 5% in accuracy is observed due to the use of non-uniform prosody modification methods.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126837194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A deep learning architecture for classifying medical images of anatomy object 一种用于解剖对象医学图像分类的深度学习架构
S. Khan, S. Yong
{"title":"A deep learning architecture for classifying medical images of anatomy object","authors":"S. Khan, S. Yong","doi":"10.1109/APSIPA.2017.8282299","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282299","url":null,"abstract":"Deep learning architectures particularly Convolutional Neural Network (CNN) have shown an intrinsic ability to automatically extract the high level representations from big data. CNN has produced impressive results in natural image classification, but there is a major hurdle to their deployment in medical domain because of the relatively lack of training data as compared to general imaging benchmarks such as ImageNet. In this paper we present a comparative evaluation of the three milestone architectures i.e. LeNet, AlexNet and GoogLeNet and propose our CNN architecture for classifying medical anatomy images. Based on the experiments, it is shown that the proposed Convolutional Neural Network architecture outperforms the three milestone architectures in classifying medical images of anatomy object.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129072598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
MSE-optimized CP-based CFO estimation in OFDM systems over multipath channels 多径信道OFDM系统中基于mse优化cp的CFO估计
Tzu-Chiao Lin, See-May Phoong
{"title":"MSE-optimized CP-based CFO estimation in OFDM systems over multipath channels","authors":"Tzu-Chiao Lin, See-May Phoong","doi":"10.1109/APSIPA.2017.8282146","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282146","url":null,"abstract":"Carrier frequency offset (CFO) is an important issue in the study of orthogonal frequency division multiplexing (OFDM) systems. It is well known that CFO destroys the orthogonality of the subcarriers and it significantly degrades the bit error rate (BER) performance of OFDM systems. In this paper, an algorithm based on cyclic prefix (CP) is proposed for blind CFO estimation in OFDM transmission over multipath channels. The proposed method minimizes the theoretical mean square error (MSE). A closed form formula is derived. Simulation results show that the proposed method performs very well.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121177553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Electrolaryngeal speech modification towards singing aid system for laryngectomees 对喉切除者助唱系统的电喉语音改造
Kazuho Morikawa, T. Toda
{"title":"Electrolaryngeal speech modification towards singing aid system for laryngectomees","authors":"Kazuho Morikawa, T. Toda","doi":"10.1109/APSIPA.2017.8282097","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282097","url":null,"abstract":"Towards the development of a singing aid system for laryngectomees, we propose a method for converting electro-laryngeal (EL) speech produced by using an electrolarynx into more naturally sounding singing voices. Singing by using the electrolarynx is less flexible because the pitch of EL speech is determined by the source excitation signal mechanically produced by the electrolarynx, and therefore, it is necessary to embed melodies of songs to be sung in advance to the electrolarynx. In addition, sound quality of singing voices produced by the electrolarynx is severely degraded by an adverse effect of its mechanical excitation sounds emitted outside as noise. To address these problems, the proposed conversion method uses 1) pitch control by playing a musical instrument and 2) noise suppression. In the pitch control, pitch patterns of music sounds played simultaneously in singing with the electrolaryx are modified so that they have specific characteristics usually observed in singing voices, and then, the modified pitch patterns are used as the target pitch patterns in the conversion from EL speech into singing voices. In the noise suppression, spectral subtraction is used to suppress the leaked excitation sounds. The experimental results demonstrate that 1) naturalness of singing voices is significantly improved by the noise suppression and 2) the pitch pattern modification is not necessarily effective in the conversion from EL speech into singing voices.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114162684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Sliced voxel representations with LSTM and CNN for 3D shape recognition 使用LSTM和CNN进行三维形状识别的切片体素表示
R. Miyagi, Masaki Aono
{"title":"Sliced voxel representations with LSTM and CNN for 3D shape recognition","authors":"R. Miyagi, Masaki Aono","doi":"10.1109/APSIPA.2017.8282044","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282044","url":null,"abstract":"We propose a sliced voxel representation, which we call Sliced Square Voxels (SSV), based on LSTM (Long Short-Term Memory) and CNN (Convolutional Neural Network) for three-dimensional shape recognition. Given an arbitrary 3D model, we first convert it into binary voxel of size 32×32×32. Then, after a view position is fixed, we slice the binary voxel data vertically in the depth direction. To utilize the 2D projected shape information of the sliced voxels, CNN has been applied. The output of CNN is fed into LSTM, which is our main idea, where the spatial topology is supposed to be favored with LSTM. From our experiments, our proposed method turns out to be superior to the baseline method which we prepared using 3DCNN. We further compared with related previous methods, using large-scale 3D model dataset (ModelNet10 and ModelNet40), and our proposed methods outperformed them.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124067827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Speech emotion recognition using convolutional long short-term memory neural network and support vector machines 基于卷积长短期记忆神经网络和支持向量机的语音情感识别
Nattapong Kurpukdee, Tomoki Koriyama, Takao Kobayashi, S. Kasuriya, C. Wutiwiwatchai, P. Lamsrichan
{"title":"Speech emotion recognition using convolutional long short-term memory neural network and support vector machines","authors":"Nattapong Kurpukdee, Tomoki Koriyama, Takao Kobayashi, S. Kasuriya, C. Wutiwiwatchai, P. Lamsrichan","doi":"10.1109/APSIPA.2017.8282315","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282315","url":null,"abstract":"In this paper, we propose a speech emotion recognition technique using convolutional long short-term memory (LSTM) recurrent neural network (ConvLSTM-RNN) as a phoneme-based feature extractor from raw input speech signal. In the proposed technique, ConvLSTM-RNN outputs phoneme- based emotion probabilities to every frame of an input utterance. Then these probabilities are converted into statistical features of the input utterance and used for the input features of support vector machines (SVMs) or linear discriminant analysis (LDA) system to classify the utterance-level emotions. To assess the effectiveness of the proposed technique, we conducted experiments in the classification of four emotions (anger, happiness, sadness, and neutral) on IEMOCAP database. The result showed that the proposed technique with either of SVM or LDA classifier outperforms the conventional ConvLSTM-based one.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128006101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Nonuniform sampling theorems for random signals in the offset linear canonical transform domain 偏置线性正则变换域中随机信号的非均匀采样定理
Y. Bao, Yan-Na Zhang, Yu-E. Song, Bingzhao Li, P. Dang
{"title":"Nonuniform sampling theorems for random signals in the offset linear canonical transform domain","authors":"Y. Bao, Yan-Na Zhang, Yu-E. Song, Bingzhao Li, P. Dang","doi":"10.1109/APSIPA.2017.8282008","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282008","url":null,"abstract":"With the rapid development of the offset linear canonical transform (OLCT) in the fields of optics and signal processing, it is necessary to consider the nonuniform sampling associated with the OLCT. Nowadays, the analysis and applications of the nonuniform sampling for deterministic signals in the OLCT domain have been well published and studied. However, none of the results about the reconstruction of nonuniform sampling for random signals in the OLCT domain have been proposed until now. In this paper, the nonuniform sampling and reconstruction of random signals in the OLCT domain are investigated. Firstly, a brief introduction to the fundamental knowledge of the OLCT and some special nonuniform sampling models are given. Then, the reconstruction theorems for random signals from nonuniform samples in the OLCT domain have been derived for different nonuniform sampling models. Finally, the simulation results are given to verify the accuracy of theoretical results.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128080719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A new pool control method for Boolean compressed sensing based adaptive group testing 一种新的基于布尔压缩感知的自适应群测试池控制方法
Yujia Lu, K. Hayashi
{"title":"A new pool control method for Boolean compressed sensing based adaptive group testing","authors":"Yujia Lu, K. Hayashi","doi":"10.1109/APSIPA.2017.8282168","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282168","url":null,"abstract":"In the adaptive group testing, the pool (a set of items to be tested) used in the next test is determined based on past test results, and its performance heavily depends on the control method of the pool. This paper proposes a new pool control method for Boolean compressed sensing based adaptive group testing. The proposed method firstly selects a pool size of the next test by minimizing the expectation of the approximated required number of tests after the next test based on the estimated number of remaining positive items. Then, when the selected pool size is one, an item having the highest probability of being positive will be selected as a pool, otherwise a pool with the selected size will be constructed by randomly selecting items. In addition, a new cardinality estimation method of positive items, that can be implemented in parallel with the proposed pool control method, is also proposed. Computer simulation results reveal that the adaptive group testing with the proposed method has better performance than that with the conventional methods for both with and without the information of cardinality of positive items.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125476602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Improving N-gram language modeling for code-switching speech recognition 改进的N-gram语言建模用于代码转换语音识别
Zhiping Zeng, Haihua Xu, Tze Yuang Chong, Chng Eng Siong, Haizhou Li
{"title":"Improving N-gram language modeling for code-switching speech recognition","authors":"Zhiping Zeng, Haihua Xu, Tze Yuang Chong, Chng Eng Siong, Haizhou Li","doi":"10.1109/APSIPA.2017.8282279","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282279","url":null,"abstract":"Code-switching language modeling is challenging due to statistics of each individual language, as well as statistics of cross-lingual language are insufficient. To compensate for the issue of statistical insufficiency, in this paper we propose a word-class n-gram language modeling approach of which only infrequent words are clustered while most frequent words are treated as singleton classes themselves. We first demonstrate the effectiveness of the proposed method on our English-Mandarin code-switching SEAME data in terms of perplexity. Compared with the conventional word n-gram language models, as well as the word-class n-gram language models of which entire vocabulary words are clustered, the proposed word-class n- gram language modeling approach can yield lower perplexity on our SEAME dev data sets. Additionally, we observed further perplexity reduction by interpolating the word n-gram language models with the proposed word-class n-gram language models. We also attempted to build word-class n-gram language models using third-party text data with our proposed method, and similar perplexity performance improvement was obtained on our SEAME dev data sets when they are interpolated with the word n-gram language models. Finally, to examine the contribution of the proposed language modeling approach to code-switching speech recognition, we conducted lattice based n-best rescoring.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132007961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信