{"title":"Multi-Task Based Mispronunciation Detection of Children Speech Using Multi-Lingual Information","authors":"Linxuan Wei, Wenwei Dong, Binghuai Lin, Jinsong Zhang","doi":"10.1109/APSIPAASC47483.2019.9023351","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023351","url":null,"abstract":"In developing a Computer-Aided Pronunciation Training (CAPT) system for Chinese ESL (English as a Second Language) children, we suffered from insufficient task-specific data. To address this issue, we propose to utilize first language (L1) and second language (L2) knowledge from both adult and children data through multitask-based transfer learning according to Speech Learning Model (SLM). Experimental set-up includes the TDNN acoustic modelling using the following training data: 70 hours of English speech by American Children (AC), 100 hours by American Adults (AA), 5 hours of Chinese speech by Chinese Children (CC), and 89 hours by Chinese Adults (CA). Testing data includes 2 hours of ESL speech by Chinese children. Experimental results showed that the inclusion of AA data brought about 13% relative Detection Error Rate (DER) reduction compared to AC only. Further inclusion of CC and CA data through L1 transfer learning brought about a total of 21% relative improvement in DER. These results suggested the proposed method is effective in mitigating insufficient data problem.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127070179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pain versus Affect? An Investigation in the Relationship between Observed Emotional States and Self-Reported Pain","authors":"F. Tsai, Yi-Ming Weng, C. Ng, Chi-Chun Lee","doi":"10.1109/APSIPAASC47483.2019.9023134","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023134","url":null,"abstract":"Painis an internal sensation intricately intertwined with individual affect states resulting in a varied expressive behaviors multimodally. Past research have indicated that emotion is an important factor in shaping one's painful experiences and behavioral expressions. In this work, we present a study into understanding the relationship between individual emotional states and self-reported pain-levels. The analyses show that there is a significant correlation between observed valence state of an individual and his/her own self-reported pain-levels. Furthermore, we propose an emotion-enriched multitask network (EEMN) to improve self-reported pain-level recognition by leveraging the rated emotional states using multimodal expressions computed from face and speech. Our framework achieves accuracy of 70.1% and 52.1% in binary and ternary classes classification. The method improves a relative of 6.6% and 13% over previous work on the same dataset. Further, our analyses not only show that an individual's valence state is negatively correlated to the pain-level reported, but also reveal that asking observers to rate valence attribute could be related more to the self-reported pain than to rate directly on the pain intensity itself.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"192 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127124488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nguyen Binh Thien, Yukoh Wakabayashi, Takahiro Fukumori, T. Nishiura
{"title":"Derivative of instantaneous frequency for voice activity detection using phase-based approach","authors":"Nguyen Binh Thien, Yukoh Wakabayashi, Takahiro Fukumori, T. Nishiura","doi":"10.1109/APSIPAASC47483.2019.9023241","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023241","url":null,"abstract":"In this paper, we consider the use of the phase spectrum in speech signal analysis. In particular, a phase-based voice activity detection (VAD) by using the derivative of instantaneous frequency is proposed. Preliminary experiments reveal that the distribution of this feature can indicate the presence or absence of speech. The performance of the proposed method is evaluated in comparison with the conventional amplitude-based method. In addition, we consider a combination of the amplitude-based and phase-based methods in a simple manner to demonstrate the complementarity of both spectra. The experimental results confirm that the phase information can be used to detect voice activity with at least 62% accuracy. The proposed method shows better performance compared to the conventional amplitude-based method in the case when a speech signal was corrupted by white noise at low signal-to-noise ratio (SNR). A combination of two methods achieves even higher performance than each of them separately, in limited conditions.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125690052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Survey on Applications of Deep Reinforcement Learning in Resource Management for 5G Heterogeneous Networks","authors":"Ying Loong Lee, Donghong Qin","doi":"10.1109/APSIPAASC47483.2019.9023331","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023331","url":null,"abstract":"Heterogeneous networks (HetNets) have been regarded as the key technology for fifth generation (5G) communications to support the explosive growth of mobile traffics. By deploying small-cells within the macrocells, the HetNets can boost the network capacity and support more users especially in the hotspot and indoor areas. Nonetheless, resource management for such networks becomes more complex compared to conventional cellular networks due to the interference arise between small-cells and macrocells, which thus making quality of service provisioning more challenging. Recent advances in deep reinforcement learning (DRL) have inspired its applications in resource management for 5G HetNets. In this paper, a survey on the applications of DRL in resource management for 5G HetNets is conducted. In particular, we review the DRL-based resource management schemes for 5G HetNets in various domains including energy harvesting, network slicing, cognitive HetNets, coordinated multipoint transmission, and big data. An insightful comparative summary and analysis on the surveyed studies is provided to shed some light on the shortcomings and research gaps in the current advances in DRL-based resource management for 5G HetNets. Last but not least, several open issues and future directions are presented.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127710348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dementia Detection by Analyzing Spontaneous Mandarin Speech","authors":"Zhaoci Liu, Zhiqiang Guo, Zhenhua Ling, Shijin Wang, Lingjing Jin, Yunxia Li","doi":"10.1109/APSIPAASC47483.2019.9023041","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023041","url":null,"abstract":"Ahstract-The Chinese population has been aging rapidly resulting in the largest population of people with dementia. Unfortunately, current screening and diagnosis of dementia rely on the evidences from cognitive tests, which are usually expensive and time consuming. Therefore, this paper studies the methods of detecting dementia by analyzing the spontaneous speech produced by Mandarin speakers in a picture description task. First, a Mandarin speech dataset contains speech from both healthy controls and patients with mild cognitive impairment (MCI) or dementia is built. Then, three categories of features, including duration features, acoustic features and linguistic features, are extracted from speech recordings and are compared by building logistic regression classifiers for dementia detection. The best performance of identifying dementia from healthy controls is obtained by fusing all features and the accuracy is 81.9% in a 10-fold cross-validation. The importance of different features is further analyzed by experiments, which indicate that the difference of perplexities derived from language models is the most effective one.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127767166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Prefatory Study on Data Channelling Mechanism towards Industry 4.0","authors":"Kheng Hui Ng, Y. Tew, M. Yip","doi":"10.1109/APSIPAASC47483.2019.9023089","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023089","url":null,"abstract":"Data are increasing in volume, variety and velocity in this Internet of things and big data era. It applies from industry (or manufacturing) process monitoring control to video surveillance analysis to track human and machines activities. Therefore, fast and accurate approaches in data channelling are needed to effectively deal with these big data. This paper presents practical methods to manage and transfer the data from industry manufacturing site to a centralized data processing hub. In this hub, data are transformed into understandable information, which can assist human in understanding and monitoring manufacturing situation autonomously. These data are collected and channelled to desired location for analysis through Open Platform Communication Unified Architecture (OPC UA). Industrial protocols and standards are used to interpret the data channelling methods and tested on several industrial machines. Result shows that size of data and number of OPC UA Client that connects to OPC Server affects the data channelling speed,","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129150492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A DOA Estimation Method in the presence of unknown mutual coupling based on Nested Arrays","authors":"Julan Xie, Fanghao Cheng, Zishu He, Huiyong Li","doi":"10.1109/APSIPAASC47483.2019.9023028","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023028","url":null,"abstract":"A novel DOA method is proposed to deal with the DOA estimation in the presence of the unknown mutual coupling for nested arrays. By using a new expression of the steering matrix in the presence of mutual coupling, a novel expression of the receiving data vector in the virtual array field is available. Then, based on a modified direction matrix constructed with block matrix, which relates to space discretized sampling grid, the sparse Bayesian compressive sensing method applies to estimate a vector, which contains the signal powers information and the mutual coupling information. The problem of off-grid DOAs is also considered for sparse Bayesian compressive sensing. Based on the estimated vector, a peak searching is performed to estimate the initial DOA. Finally, the estimation of DOA is modified to initial estimate plus off-grid error value. The advantage of fully utilizing the degree of freedom of nested arrays is preserved in this proposed algorithm. Moreover, no complicated calculation is needed to obtain the mutual coupling coefficients or rearrange the position of array element. Theoretical analysis and simulation results show the effectiveness of the proposed algorithm.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132901965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Digital implementation of Hilbert Transform in the LCT domain associated with FIR filter","authors":"B. Deng, Qingshun Huang, Lin Zhang","doi":"10.1109/APSIPAASC47483.2019.9023095","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023095","url":null,"abstract":"In this paper, digital implementation of Hilbert Transform in the LCT domain is proposed based on FIR filter. First, some definitions are described such as linear canonical transform, Hilbert transform, and analytical signal. Then, the implementation principle is analyzed about Hilbert transform in the LCT domain. Finally, the digital implementation method is proposed based on FIR filter.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128205323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cunhang Fan, B. Liu, J. Tao, Jiangyan Yi, Zhengqi Wen, Ye Bai
{"title":"Noise Prior Knowledge Learning for Speech Enhancement via Gated Convolutional Generative Adversarial Network","authors":"Cunhang Fan, B. Liu, J. Tao, Jiangyan Yi, Zhengqi Wen, Ye Bai","doi":"10.1109/APSIPAASC47483.2019.9023216","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023216","url":null,"abstract":"Speech enhancement generative adversarial network (SEGAN) is an end-to-end deep learning architecture, which only uses the clean speech as the training targets. However, when the signal-to-noise ratio (SNR) is very low, predicting clean speech signals could be very difficult as the speech is dominated by the noise. In order to address this problem, in this paper, we propose a gated convolutional neural network (CNN) SEGAN (GSEGAN) with noise prior knowledge learning to address this problem. The proposed model not only estimates the clean speech, but also learns the noise prior knowledge to assist the speech enhancement. In addition, gated CNN has an excellent potential for capturing long-term temporal dependencies than regular CNN. Motivated by this, we use a gated CNN architecture to acquire more detailed information at waveform level instead of regular CNN. We evaluate the proposed method GSEGAN on Voice Bank corpus. Experimental results show that the proposed method GSEGAN outperforms the SEGAN baseline, with a relative improvement of 0.7%, 28.2% and 43.9% for perceptual evaluation of speech quality (PESQ), overall Signal-to-Noise Ratio (SNRovl) and Segmental Signal-to-Noise Ratio (SNRseg), respectively.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128364744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Izumi, Shingo Uenohara, K. Furuya, Yuuki Tachioka
{"title":"Activation Driven Synchronized Joint Diagonalization for Underdetermined Sound Source Separation","authors":"T. Izumi, Shingo Uenohara, K. Furuya, Yuuki Tachioka","doi":"10.1109/APSIPAASC47483.2019.9023297","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023297","url":null,"abstract":"Blind sound source separation (BSS) is effective to improve the performance of various applications such as speech recognition. The condition of BSS can be divided into underdetermined conditions (number of microphones < number of sound sources) and overdetermined conditions (number of microphones ≥ number of sound sources). Here, we focus on Synchronized Joint Diagonalization (SJD) [6], which is a newly proposed BSS method and utilizes non-stationarity of a sound source signal. The advantage of SJD is faster separation and smaller number of parameters to be estimated. However, the application of SJD is limited to overdetermined conditions, and the performance of SJD is degraded in underdetermined conditions. In this paper, to solve these performance degradations, we propose an activation driven SJD, which uses a pre-estimated activation matrix. It is practical because activation estimation is easier than source separation. The effectiveness of the proposed method was validated by conducting BSS experiments. We confirmed that the performance of SJD can be improved in underdetermined conditions.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134476633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}