Ming-Hsiang Su, Chung-Hsien Wu, Kun-Yi Huang, Qian-Bei Hong, H. Wang
{"title":"Personality trait perception from speech signals using multiresolution analysis and convolutional neural networks","authors":"Ming-Hsiang Su, Chung-Hsien Wu, Kun-Yi Huang, Qian-Bei Hong, H. Wang","doi":"10.1109/APSIPA.2017.8282287","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282287","url":null,"abstract":"This study presents an approach to personality trait (PT) perception from speech signals using wavelet-based multiresolution analysis and convolutional neural networks (CNNs). In this study, first, wavelet transform is employed to decompose the speech signals into the signals at different levels of resolution. Then, the acoustic features of the speech signals at each resolution are extracted. Given the acoustic features, the CNN is adopted to generate the profiles of the Big Five Inventory-10 (BFI- 10), which provide a quantitative measure for expressing the degree of the presence or absence of a set of 10 basic BFI items. The BFI-10 profiles are further fed into five artificial neural networks (ANN), each for one of the five personality dimensions: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism for PT perception. To evaluate the performance of the proposed method, experiments were conducted over the SSPNet Speaker Personality Corpus (SPC), including 640 clips randomly extracted from the French news bulletins in the INTERSPEECH 2012 speaker trait sub-challenge. From the experimental results, an average PT perception accuracy of 71.97% was obtained, outperforming the ANN-based method and the Baseline method in the INTERSPEECH 2012 speaker trait sub-challenge.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133577060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhichao Peng, Zhi Zhu, M. Unoki, J. Dang, M. Akagi
{"title":"Speech emotion recognition using multichannel parallel convolutional recurrent neural networks based on gammatone auditory filterbank","authors":"Zhichao Peng, Zhi Zhu, M. Unoki, J. Dang, M. Akagi","doi":"10.1109/APSIPA.2017.8282316","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282316","url":null,"abstract":"Speech Emotion Recognition (SER) using deep learning methods based on computational auditory models of human auditory system is a new way to identify emotional state. In this paper, we propose to utilize multichannel parallel convolutional recurrent neural networks (MPCRNN) to extract salient features based on Gammatone auditory filterbank from raw waveform and reveal that this method is effective for speech emotion recognition. We first divide the speech signal into segments, and then get multichannel data using Gammatone auditory filterbank, which is used as a first stage before applying MPCRNN to get the most relevant features for emotion recognition from speech. We subsequently obtain emotion state probability distribution for each speech segment. Eventually, utterance-level features are constructed from segment-level probability distributions and fed into support vector machine (SVM) to identify the emotions. According to the experimental results, speech emotion features can be effectively learned utilizing the proposed deep learning approach based on Gammatone auditory filterbank.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131176377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic detection of circulating tumor cells based on microscopic images","authors":"Yunxia Liu, Yang Yang, Yuehui Chen","doi":"10.1109/APSIPA.2017.8282138","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282138","url":null,"abstract":"Detection of circulating tumor cells (CTCs) plays an important role in early diagnosis of cancer. Traditional detection relies on empirical knowledge of doctors, which is time consuming and suffers from problems such as subjectivity and low repeatability. To improve the objectiveness and efficiency of CTCs detection, an automatic detection method based on digital image processing techniques of scanned microscopic images are proposed in this paper. First, the overall architecture and the image capturing system are introduced. To fully exploit the optical structures of the blood, microscopic images are scanned at ten different focal lengths. Then, an adaptive threshold is proposed for binarization of the images, where morphologic processing operations are applied to detect suspicious CTCs regions. Finally, detection results from all ten layers are fused to generate the final detection output. Location, range and related graphical information are stored in a database to assist further examination, while interactive navigation display is also supported by the system. The effectiveness of the proposed system is verified by simulation experiments.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131401583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classifying road surface conditions using vibration signals","authors":"Lounell B. Gueta, Akiko Sato","doi":"10.1109/APSIPA.2017.8281999","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8281999","url":null,"abstract":"The paper aims to classify road surface types and conditions by characterizing the temporal and spectral features of vibration signals gathered from land roads. In the past, road surfaces have been studied for detecting road anomalies like bumps and potholes. This study extends the analysis to detect road anomalies such as patches and road gaps. In terms of temporal features such as magnitude peaks and variance, these anomalies have common features to road anomalies. Therefore, a classification method based on support vector classifier is proposed by taking into account both the temporal and spectral features of the road vibrations as well as factor such as vehicle speed. It is tested on a real data gathered by conducting a smart phone-based data collection between Thailand and Cambodia and is shown to be effective in differentiating road segments with and without anomalies. The method is applicable to undertaking appropriate road maintenance works.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131558193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"QoE-estimation models for video streaming services","authors":"Kazuhisa Yamagishi","doi":"10.1109/APSIPA.2017.8282058","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282058","url":null,"abstract":"As encoders and decoders (codecs), networks, and displays have become more technologically advanced, network and video-streaming-service providers have been able to provide video-streaming services over a network (e.g., fiber-to-the home and long-term evolution); therefore, the use of these services has been increasing drastically in the past decade. To maintain the high quality of experience (QoE) of these services, network and service providers need to invest in equipment (e.g., network devices, codecs, and servers). To increase return on investment, the QoE of these services needs to be appropriately designed with as little investment as possible, and its normality needs to be monitored while services are provided. In general, the QoE of these services degrades due to compression and network conditions (e.g., packet loss and delay). Therefore, it is necessary to develop a QoE-estimation model by taking into account the impact of compression and network on quality. This paper introduces subjective-quality-assessment methods and QoE-estimation models that assess user QoE in video-streaming services and standardization activities.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131580428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data embedding in scalable coded video","authors":"LieLin Pang, Koksheik Wong, Sze‐Teng Liong","doi":"10.1109/APSIPA.2017.8282210","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282210","url":null,"abstract":"In this paper, a self-cancelling method is proposed to embed data into multiple layers of a spatial scalable coded video. Correlation of prediction mode from multiple layers are analyzed and exploited to offset the distortion introduced at the base layer(BL) when embedding data at the enhancement layer (EL). Specifically, in the base layer, the prediction modes are divided into two groups, where one group encodes '0' while another encodes '1'. Data embedding in the enhancement layer is designed to compensate the errors introduced in the base layer. Experiment results show that the scalable coded video can effectively carry additional payload in multiple layers while maintaining the video quality and bit rate. In the best case scenario, when 104,141 bits are embedded into the BasketballDrive (BL: 1280 × 720 and EL: 1920 × 1080) video sequence, the bit rate is slightly increased while insignificant degradation in perceptual quality is observed.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128099671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy distribution analysis and nonlinear dynamical analysis of phonation in patients with Parkinson's disease","authors":"H. Zhang, N. Yan, Lan Wang, M. Ng","doi":"10.1109/APSIPA.2017.8282102","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282102","url":null,"abstract":"Patients with Parkinson's disease (PD) have been reported to exhibit vocal impairment during the course of PD. Recently, development of automatic PD severity assessment based on acoustical characteristics from voice recordings has been attempted. However, objective extraction of appropriate features that can characterize PD symptoms faces many problems, due to the prevalence of aperiodicity in PD voices, rendering traditional perturbation analysis unreliable. The present study attempted to examine the validity of more advanced acoustic analysis techniques based on energy distribution measures and nonlinear dynamical measures. All of the features were extracted from sustained phonations of the vowel /a/ produced by 16 PD patients and 20 age-matched non- pathologic subjects. Results revealed that the energy distribution measures, such as glottal-to-noise excitation (GNE), and empirical mode decomposition excitation ratio (EMD-ER), as well as nonlinear dynamical measures including correlation dimension (D2), permutation entropy (PE), and detrended fluctuation analysis (DFA) were effective in discerning between PD and normal voices. This finding suggests that both energy distribution and nonlinear dynamical analyses could be appropriate measures in determining the status of PD voice.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128418340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuexian Zou, Rongzhi Gu, Disong Wang, A. Jiang, C. Ritz
{"title":"Learning a robust DOA estimation model with acoustic vector sensor cues","authors":"Yuexian Zou, Rongzhi Gu, Disong Wang, A. Jiang, C. Ritz","doi":"10.1109/APSIPA.2017.8282304","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282304","url":null,"abstract":"Accurate and robust Direction of Arrival (DOA) estimation with small microphone arrays is gaining an increasing demand in service robotics and smart home applications. Classic non-learning DOA estimation methods show unsatisfactory performance under low SNR or high reverberation conditions. Meanwhile, some research outcomes illustrate that learning methods with Neural Networks (NN) ask for careful array element quantity or layout regulation which is impractical for many applications. In order to obtain robust DOA estimation with small arrays, taking the learning ability of Deep Neural Networks (DNN), we propose to form the training pairs by using Acoustic Vector Sensor - Direction of Arrival (AVS-DOA) cues and its counterpart DOA which can be simulated under different SNR and reverberation conditions. Then DNN-based DOA model is trained accordingly and the performance of the model has been fully investigated with different activation functions, network structures and dropout rates. With the cross-validation process, the model performing best experimentally is selected as the optimal DOA model. Experimental results validate the effectiveness of our DNN based DOA model which outperforms the non-learning method, especially under poor acoustic conditions.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128621486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Binaural beamforming with spatial cues preservation for hearing aids in real-life complex acoustic environments","authors":"Hala As’ad, M. Bouchard, A. H. Kamkar-Parsi","doi":"10.1109/APSIPA.2017.8282250","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282250","url":null,"abstract":"This work is introducing novel binaural beamforming algorithms for hearing aids, with a good trade-off between noise reduction and the preservation of the binaural cues for different types of sources (directional interfering talker sources, diffuse-like background noise). In the proposed methods, no knowledge of the interfering talkers' direction or the second order statistics of the noise-only components is required. Different classification decisions are considered in the time- frequency domain based on the power, the power difference, and the complex coherence of different available signals. Simulations are performed using signals recorded from multichannel binaural hearing aids, to validate the performance of the proposed algorithms under different acoustic scenarios and using different microphone configurations. For the simulations performed in this paper, a good knowledge of the target direction and propagation model is assumed. For hearing aids, this assumption is typically more realistic than the assumption of knowing the direction and propagation model of the interferer talkers. The comparison of the performance results is done with other algorithms that don't require information on the directions or statistics of the interfering talker sources and the background noise. The results indicate that the proposed algorithms can either provide nearly the same noise reduction as classical beamformers but with improved noise binaural cues preservation, or they can produce a good trade-off between noise reduction and noise binaural cues preservation.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134195973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distinction between healthy individuals and patients with confident abnormal respiration","authors":"Masara Yamashita, Tasuku Miura, S. Matsunaga","doi":"10.1109/APSIPA.2017.8282199","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282199","url":null,"abstract":"To adequately distinguish between healthy individuals and patients with respiratory disorders, we propose a new classification method combining two conventional methods. The first method entails determining the presence of a \"confident abnormal respiration\" period (used to describe individuals for whom the likelihood of an abnormal respiratory candidate was much higher than for that of a normal candidate, and for which patients could be determined with high accuracy). The second method entails comparing the two total likelihoods (through a series of inspiration and expiration periods) of normal and abnormal candidates of each respiratory period in a test sample. In our new method, if one or more confident abnormal respiration phases are detected in a test respiration sample, the first method is used; otherwise, the second method is used for the classification. Our proposed method achieved significantly higher performance (88.6%) at the 5% level (p=0.027) than does each conventional classification method alone (80.6% and 84.9%). This validates our newly proposed classification method.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134032254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}