{"title":"Geometric Discriminant Analysis for I-vector Based Speaker Verification","authors":"Can Xu, Xianhong Chen, Liang He, Jia Liu","doi":"10.1109/APSIPAASC47483.2019.9023338","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023338","url":null,"abstract":"Many i-vector based speaker verification use linear discriminant analysis (LDA) as a post-processing stage. LDA maximizes the arithmetic mean of the Kullback-Leibler (KL) divergences between different pairs of speakers. However, for speaker verification, speakers with small divergence are easily misjudged. LDA is not optimal because it does not emphasize on enlarging small divergences. In addition, LDA makes an assumption that the i-vectors of different speakers are well modeled by Gaussian distributions with identical class covariance. Actually, the distributions of different speakers can have different covariances. Motivated by these observations, we explore speaker verification with geometric discriminant analysis (GDA), which uses geometric mean instead of arithmetic mean when maximizing the KL divergences. It puts more emphasis on enlarging small divergences. Furthermore, we study the heteroscedastic extension of GDA (HGDA), taking different covariances into consideration. Experiments on i-vector machine learning challenge indicate that, when the number of training speakers becomes smaller, the relative performance improvement of GDA and HGDA compared with LDA becomes larger. GDA and HGDA are better choices especially when training data is limited.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134589632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-quality waveform generator from fundamental frequency, spectral envelope, and band aperiodicity","authors":"M. Morise, Takuro Shono","doi":"10.1109/APSIPAASC47483.2019.9023206","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023206","url":null,"abstract":"This paper introduces a waveform generation algorithm from three speech parameters (fundamental frequency fo, spectral envelope, and band aperiodicity). The conventional speech analysis/synthesis system based on a vocoder mainly has a waveform generator based on pitch synchronous overlap and add (PSOLA). Since it uses the fast Fourier transform (FFT) to generate the vocal cord vibration, the processing speed is proportional to the fo. The algorithm also uses the spectral representation of the aperiodicity, whereas the band aperiodicity is mainly used in speech synthesis applications such as statistical parametric speech synthesis. We propose a waveform generation algorithm that reduces the computational cost and memory usage without degrading the synthesized speech. The algorithm utilizes excitation signal generation by directly using the band aperiodicity. The computational cost in a certain period is fixed because the excitation signal is filtered and processed by the overlap-add (OLA) algorithm. We used the re-synthesized speech to perform two evaluations for the processing speed and sound quality. The results showed that the sound quality of speech synthesized was almost the same by our proposed algorithm as by the conventional algorithm. The proposed algorithm can also reduce computational cost and memory usage.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133431818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Improved Retinex low-illumination image enhancement algorithm","authors":"Shao-Chuan Wang, D. Gao, Yangping Wang, Song Wang","doi":"10.1109/APSIPAASC47483.2019.9023017","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023017","url":null,"abstract":"Low-illumination images are generally low-quality images. The retinex algorithm can cause halo artifacts and loss of details in processing. Therefore, an improved Retinex algorithm is proposed. Firstly, the HSI color space which is more in line with the human visual characteristics is selected instead of the RGB image, that is, the luminance component I is processed. Then, the illuminance image is estimated by using a guided filter that fuses the edge detection operator, and the edge detection operator can be better positioned. At the edge, an illuminance image with rich edge information can be obtained; after obtaining the illuminance image, the reflected image can be obtained by the Retinex principle, the obtained reflected image is subjected to low-rank decomposition, and the low-rank property of the image is used to suppress the enlarged halo and the enhancement process. Noise; finally, the visual effect is further improved by local contrast enhancement. Experiments show that the algorithm can effectively improve the brightness and contrast of the image, preserve the details of the image, and also suppress the noise interference in the enhancement process. The subjective visual effect and objective evaluation results of the image have also been greatly improved.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133853842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling Multi-source Information Diffusion: A Graphical Evolutionary Game Approach","authors":"Hong Hu, Yuejiang Li, H. Zhao, Yan Chen","doi":"10.1109/APSIPAASC47483.2019.9023248","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023248","url":null,"abstract":"Modeling of information diffusion over social networks is of crucial importance to better understand how the avalanche of information overflow affects our social life and economy, thus preventing the detrimental consequences caused by rumors and motivating some beneficial information spreading. However, most model-based works on information diffusion either consider the spreading of one single message or assume different diffusion processes are independent of each other. In real-world scenarios, multi-source correlated information often spreads together, which jointly influences users' decisions. In this paper, we model the multi-source information diffusion process from a graphical evolutionary game perspective. Specifically, we model users' local interactions and strategic decision making, and analyze the evolutionary dynamics of the diffusion processes of correlated information, aiming to investigate the underlying principles dominating the complex multi-source information diffusion. Simulation results on synthetic and Facebook networks are consistent with our theoretical analysis. We also test our proposed model on Weibo user forwarding data and observe a good prediction performance on real-world information spreading process, which demonstrates the effectiveness of the proposed approach.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133931863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Domain Adaptation Neural Network for Acoustic Scene Classification in Mismatched Conditions","authors":"Rui Wang, Mou Wang, Xiao-Lei Zhang, S. Rahardja","doi":"10.1109/APSIPAASC47483.2019.9023057","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023057","url":null,"abstract":"Acoustic scene classification is a task of predicting the acoustic environment of an audio recording. Because the training and test conditions in most real world acoustic scene classification problems do not match, it is strongly necessary to develop domain adaptation methods to solve the cross-domain problem. In this paper, we propose a domain adaptation neural network (DANN) based acoustic scene classification (ASC) method. Specifically, we first extract an acoustic feature, i.e. log-Mel spectrogram, which has been proven to be effective in previous studies. Then, we train a DANN to project the training and test domains into one common space where the acoustic scenes are categorized jointly. To boost the overall performance of the proposed method, we further train an ensemble of convolutional neural network (CNN) models with different parameter settings respectively. Finally, we fuse the DANN and CNN models by averaging the outputs of the models. We have evaluated the proposed method on the subtask B of task 1 of the DCASE 2019 ASC challenge, which is a closed-set classification problem whose audio recordings were recorded by mismatched devices. Experimental results demonstrate the effectiveness of the proposed method on the acoustic scene classification problem in mismatched conditions.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123001956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Feng Li, Kaizhi Qian, M. Hasegawa-Johnson, M. Akagi
{"title":"Monaural Singing Voice Separation Using Fusion-Net with Time-Frequency Masking","authors":"Feng Li, Kaizhi Qian, M. Hasegawa-Johnson, M. Akagi","doi":"10.1109/APSIPAASC47483.2019.9023055","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023055","url":null,"abstract":"Monaural singing voice separation has received much attention in recent years. In this paper, we propose a novel neural network architecture for monaural singing voice separation, Fusion-Net, which is combining U-Net with the residual convolutional neural network to develop a much deeper neural network architecture with summation-based skip connections. In addition, we apply time-frequency masking to improve the separation results. Finally, we integrate the phase spectra with magnitude spectra as the post-processing to optimize the separated singing voice from the mixture music. Experimental results demonstrate that the proposed method can achieve better separation performance than the previous U-Net architecture on the ccMixter database.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127860015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint Sparse Channel Estimation in Downlink NOMA System","authors":"Haohui Jia, Na Chen, T. Higashino, M. Okada","doi":"10.1109/APSIPAASC47483.2019.9023326","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023326","url":null,"abstract":"Non-orthogonal multiple access (NOMA) is regarded as one of the most important techniques for future 5G systems. In the downlink general NOMA schemes, the received NOMA signal will be analyzed via two parallel channel state information (CSI) after sparse multiple path channel fading. In this paper, by exploiting the inherent sparsity of the channel, we proposed a low-complexity joint channel estimation in a single-input and multiple-output antennas system, based on the compressed sensing to detect each layer channel state information. As a comparison, the performance of compressed sensing is better than the conventional method Least-Square (LS) and Minimum Mean Square Error (MMSE).","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128622510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ngoc-Dung T. Tieu, H. Nguyen, Fuming Fang, J. Yamagishi, I. Echizen
{"title":"An RGB Gait Anonymization Model for Low-Quality Silhouettes","authors":"Ngoc-Dung T. Tieu, H. Nguyen, Fuming Fang, J. Yamagishi, I. Echizen","doi":"10.1109/APSIPAASC47483.2019.9023188","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023188","url":null,"abstract":"Gait anonymization while maintaining naturalness is used for protecting a person's identity against gait recognition systems when a video of the person walking is uploaded to social media. There has been some research on gait anonymization, but only for high-quality silhouette gaits. We present an RGB gait anonymization model for low-quality silhouette gaits that can generate natural, seamless anonymized gaits for which the original silhouettes cannot be extracted correctly. Our model includes two main networks. The first one, a deep convolutional generative adversarial network, is used to anonymize the original gait by adding to it a random noise vector. By training on high-quality silhouette data, this network can generate a high-quality anonymized silhouette sequence from a low-quality silhouette one. Restricting its input to binary silhouette sequences instead of color gaits forces it to focus on anonymizing the gait rather than changing body color. The second main network, which follows the first one, colorizes the anonymized silhouette sequence generated by the first network by using the color of the original gait. Evaluation in terms of success rate and naturalness demonstrated that our model can anonymize gaits while maintaining naturalness.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128854346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Convolutional Attention Model for Retinal Edema Segmentation","authors":"Phuong Le Thi, Tuan D. Pham, Jia-Ching Wang","doi":"10.1109/APSIPAASC47483.2019.9023282","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023282","url":null,"abstract":"Deep learning and computer vision that become popular in recent years are advantage techniques in medical diagnosis. A large database of Optical Coherence Tomography (OCT) images can be used to train a deep learning model which can support and suggest effectively illnesses and status of a patient. Therefore, semantic image segmentation is used to detect and categorize anomaly regions in OCT images. However, numerous existing approaches ignored spatial structure as well as contextual information in a given image. To overcome existing problems, this work proposes a novel method which takes advantage of the deep convolutional neural network, attention block, pyramid pooling module and auxiliary connections between layers. Attention block helps to detect the spatial structure of a given image. Beside, pyramid pooling module has a responsibility to identify the shape and margin of the anomaly region. In additional, auxiliary connections support to enrich useful information pass through one layer as well as reduce overfitting problem. Our work produces higher accuracy than state-of-the-art methods with 78.19% comparing to Deeplab_ v3 76.19% and Bisenet 76.85% in term of dice coefficient. Additionally, a number of parameters in our work is smaller than the previous approaches.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"390 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115991217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhibo Rao, Mingyi He, Yuchao Dai, Zhidong Zhu, Bo Li, Renjie He
{"title":"MSDC-Net: Multi-Scale Dense and Contextual Networks for Stereo Matching","authors":"Zhibo Rao, Mingyi He, Yuchao Dai, Zhidong Zhu, Bo Li, Renjie He","doi":"10.1109/APSIPAASC47483.2019.9023237","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023237","url":null,"abstract":"Disparity prediction from stereo images is essential to computer vision applications such as autonomous driving, 3D model reconstruction, and object detection. To more accurately predict disparity map, a novel deep learning architecture (called MSDC-Net) for detecting the disparity map from a rectified pair of stereo images is proposed. Our MSDC-Net contains two modules: the multi-scale fusion 2D convolution module and the multi-scale residual 3D convolution module. The multi-scale fusion 2D convolution module exploits the potential multi-scale features, which extracts and fuses the different scale features by Dense-Net. The multi-scale residual 3D convolution module learns the different scale geometry context from the cost volume which aggregated by the multi-scale fusion 2D convolution module. Experimental results on Scene Flow and KITTI datasets demonstrate that our MSDC-Net significantly outperforms other approaches in the non-occluded region.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115994841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}