2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献_第10页

Geometric Discriminant Analysis for I-vector Based Speaker Verification 基于i向量的说话人验证的几何判别分析

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2019-11-01 DOI: 10.1109/APSIPAASC47483.2019.9023338

Can Xu, Xianhong Chen, Liang He, Jia Liu

{"title":"Geometric Discriminant Analysis for I-vector Based Speaker Verification","authors":"Can Xu, Xianhong Chen, Liang He, Jia Liu","doi":"10.1109/APSIPAASC47483.2019.9023338","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023338","url":null,"abstract":"Many i-vector based speaker verification use linear discriminant analysis (LDA) as a post-processing stage. LDA maximizes the arithmetic mean of the Kullback-Leibler (KL) divergences between different pairs of speakers. However, for speaker verification, speakers with small divergence are easily misjudged. LDA is not optimal because it does not emphasize on enlarging small divergences. In addition, LDA makes an assumption that the i-vectors of different speakers are well modeled by Gaussian distributions with identical class covariance. Actually, the distributions of different speakers can have different covariances. Motivated by these observations, we explore speaker verification with geometric discriminant analysis (GDA), which uses geometric mean instead of arithmetic mean when maximizing the KL divergences. It puts more emphasis on enlarging small divergences. Furthermore, we study the heteroscedastic extension of GDA (HGDA), taking different covariances into consideration. Experiments on i-vector machine learning challenge indicate that, when the number of training speakers becomes smaller, the relative performance improvement of GDA and HGDA compared with LDA becomes larger. GDA and HGDA are better choices especially when training data is limited.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134589632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

High-quality waveform generator from fundamental frequency, spectral envelope, and band aperiodicity 高质量的波形发生器从基频，频谱包络，和频带非周期性

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2019-11-01 DOI: 10.1109/APSIPAASC47483.2019.9023206

M. Morise, Takuro Shono

{"title":"High-quality waveform generator from fundamental frequency, spectral envelope, and band aperiodicity","authors":"M. Morise, Takuro Shono","doi":"10.1109/APSIPAASC47483.2019.9023206","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023206","url":null,"abstract":"This paper introduces a waveform generation algorithm from three speech parameters (fundamental frequency fo, spectral envelope, and band aperiodicity). The conventional speech analysis/synthesis system based on a vocoder mainly has a waveform generator based on pitch synchronous overlap and add (PSOLA). Since it uses the fast Fourier transform (FFT) to generate the vocal cord vibration, the processing speed is proportional to the fo. The algorithm also uses the spectral representation of the aperiodicity, whereas the band aperiodicity is mainly used in speech synthesis applications such as statistical parametric speech synthesis. We propose a waveform generation algorithm that reduces the computational cost and memory usage without degrading the synthesized speech. The algorithm utilizes excitation signal generation by directly using the band aperiodicity. The computational cost in a certain period is fixed because the excitation signal is filtered and processed by the overlap-add (OLA) algorithm. We used the re-synthesized speech to perform two evaluations for the processing speed and sound quality. The results showed that the sound quality of speech synthesized was almost the same by our proposed algorithm as by the conventional algorithm. The proposed algorithm can also reduce computational cost and memory usage.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133431818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Improved Retinex low-illumination image enhancement algorithm 一种改进的Retinex低照度图像增强算法

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2019-11-01 DOI: 10.1109/APSIPAASC47483.2019.9023017

Shao-Chuan Wang, D. Gao, Yangping Wang, Song Wang

{"title":"An Improved Retinex low-illumination image enhancement algorithm","authors":"Shao-Chuan Wang, D. Gao, Yangping Wang, Song Wang","doi":"10.1109/APSIPAASC47483.2019.9023017","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023017","url":null,"abstract":"Low-illumination images are generally low-quality images. The retinex algorithm can cause halo artifacts and loss of details in processing. Therefore, an improved Retinex algorithm is proposed. Firstly, the HSI color space which is more in line with the human visual characteristics is selected instead of the RGB image, that is, the luminance component I is processed. Then, the illuminance image is estimated by using a guided filter that fuses the edge detection operator, and the edge detection operator can be better positioned. At the edge, an illuminance image with rich edge information can be obtained; after obtaining the illuminance image, the reflected image can be obtained by the Retinex principle, the obtained reflected image is subjected to low-rank decomposition, and the low-rank property of the image is used to suppress the enlarged halo and the enhancement process. Noise; finally, the visual effect is further improved by local contrast enhancement. Experiments show that the algorithm can effectively improve the brightness and contrast of the image, preserve the details of the image, and also suppress the noise interference in the enhancement process. The subjective visual effect and objective evaluation results of the image have also been greatly improved.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133853842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Modeling Multi-source Information Diffusion: A Graphical Evolutionary Game Approach 多源信息扩散建模:一种图形进化博弈方法

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2019-11-01 DOI: 10.1109/APSIPAASC47483.2019.9023248

Hong Hu, Yuejiang Li, H. Zhao, Yan Chen

{"title":"Modeling Multi-source Information Diffusion: A Graphical Evolutionary Game Approach","authors":"Hong Hu, Yuejiang Li, H. Zhao, Yan Chen","doi":"10.1109/APSIPAASC47483.2019.9023248","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023248","url":null,"abstract":"Modeling of information diffusion over social networks is of crucial importance to better understand how the avalanche of information overflow affects our social life and economy, thus preventing the detrimental consequences caused by rumors and motivating some beneficial information spreading. However, most model-based works on information diffusion either consider the spreading of one single message or assume different diffusion processes are independent of each other. In real-world scenarios, multi-source correlated information often spreads together, which jointly influences users' decisions. In this paper, we model the multi-source information diffusion process from a graphical evolutionary game perspective. Specifically, we model users' local interactions and strategic decision making, and analyze the evolutionary dynamics of the diffusion processes of correlated information, aiming to investigate the underlying principles dominating the complex multi-source information diffusion. Simulation results on synthetic and Facebook networks are consistent with our theoretical analysis. We also test our proposed model on Weibo user forwarding data and observe a good prediction performance on real-world information spreading process, which demonstrates the effectiveness of the proposed approach.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133931863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Domain Adaptation Neural Network for Acoustic Scene Classification in Mismatched Conditions 不匹配条件下声学场景分类的领域自适应神经网络

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2019-11-01 DOI: 10.1109/APSIPAASC47483.2019.9023057

Rui Wang, Mou Wang, Xiao-Lei Zhang, S. Rahardja

{"title":"Domain Adaptation Neural Network for Acoustic Scene Classification in Mismatched Conditions","authors":"Rui Wang, Mou Wang, Xiao-Lei Zhang, S. Rahardja","doi":"10.1109/APSIPAASC47483.2019.9023057","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023057","url":null,"abstract":"Acoustic scene classification is a task of predicting the acoustic environment of an audio recording. Because the training and test conditions in most real world acoustic scene classification problems do not match, it is strongly necessary to develop domain adaptation methods to solve the cross-domain problem. In this paper, we propose a domain adaptation neural network (DANN) based acoustic scene classification (ASC) method. Specifically, we first extract an acoustic feature, i.e. log-Mel spectrogram, which has been proven to be effective in previous studies. Then, we train a DANN to project the training and test domains into one common space where the acoustic scenes are categorized jointly. To boost the overall performance of the proposed method, we further train an ensemble of convolutional neural network (CNN) models with different parameter settings respectively. Finally, we fuse the DANN and CNN models by averaging the outputs of the models. We have evaluated the proposed method on the subtask B of task 1 of the DCASE 2019 ASC challenge, which is a closed-set classification problem whose audio recordings were recorded by mismatched devices. Experimental results demonstrate the effectiveness of the proposed method on the acoustic scene classification problem in mismatched conditions.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123001956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Monaural Singing Voice Separation Using Fusion-Net with Time-Frequency Masking 基于时频掩蔽融合网的单耳歌声分离

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2019-11-01 DOI: 10.1109/APSIPAASC47483.2019.9023055

Feng Li, Kaizhi Qian, M. Hasegawa-Johnson, M. Akagi

引用次数: 1

Joint Sparse Channel Estimation in Downlink NOMA System 下行NOMA系统中的联合稀疏信道估计

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2019-11-01 DOI: 10.1109/APSIPAASC47483.2019.9023326

Haohui Jia, Na Chen, T. Higashino, M. Okada

引用次数: 2

An RGB Gait Anonymization Model for Low-Quality Silhouettes 低质量轮廓的RGB步态匿名化模型

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2019-11-01 DOI: 10.1109/APSIPAASC47483.2019.9023188

Ngoc-Dung T. Tieu, H. Nguyen, Fuming Fang, J. Yamagishi, I. Echizen

{"title":"An RGB Gait Anonymization Model for Low-Quality Silhouettes","authors":"Ngoc-Dung T. Tieu, H. Nguyen, Fuming Fang, J. Yamagishi, I. Echizen","doi":"10.1109/APSIPAASC47483.2019.9023188","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023188","url":null,"abstract":"Gait anonymization while maintaining naturalness is used for protecting a person's identity against gait recognition systems when a video of the person walking is uploaded to social media. There has been some research on gait anonymization, but only for high-quality silhouette gaits. We present an RGB gait anonymization model for low-quality silhouette gaits that can generate natural, seamless anonymized gaits for which the original silhouettes cannot be extracted correctly. Our model includes two main networks. The first one, a deep convolutional generative adversarial network, is used to anonymize the original gait by adding to it a random noise vector. By training on high-quality silhouette data, this network can generate a high-quality anonymized silhouette sequence from a low-quality silhouette one. Restricting its input to binary silhouette sequences instead of color gaits forces it to focus on anonymizing the gait rather than changing body color. The second main network, which follows the first one, colorizes the anonymized silhouette sequence generated by the first network by using the color of the original gait. Evaluation in terms of success rate and naturalness demonstrated that our model can anonymize gaits while maintaining naturalness.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128854346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Convolutional Attention Model for Retinal Edema Segmentation 视网膜水肿分割的卷积注意模型

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2019-11-01 DOI: 10.1109/APSIPAASC47483.2019.9023282

Phuong Le Thi, Tuan D. Pham, Jia-Ching Wang

{"title":"Convolutional Attention Model for Retinal Edema Segmentation","authors":"Phuong Le Thi, Tuan D. Pham, Jia-Ching Wang","doi":"10.1109/APSIPAASC47483.2019.9023282","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023282","url":null,"abstract":"Deep learning and computer vision that become popular in recent years are advantage techniques in medical diagnosis. A large database of Optical Coherence Tomography (OCT) images can be used to train a deep learning model which can support and suggest effectively illnesses and status of a patient. Therefore, semantic image segmentation is used to detect and categorize anomaly regions in OCT images. However, numerous existing approaches ignored spatial structure as well as contextual information in a given image. To overcome existing problems, this work proposes a novel method which takes advantage of the deep convolutional neural network, attention block, pyramid pooling module and auxiliary connections between layers. Attention block helps to detect the spatial structure of a given image. Beside, pyramid pooling module has a responsibility to identify the shape and margin of the anomaly region. In additional, auxiliary connections support to enrich useful information pass through one layer as well as reduce overfitting problem. Our work produces higher accuracy than state-of-the-art methods with 78.19% comparing to Deeplab_ v3 76.19% and Bisenet 76.85% in term of dice coefficient. Additionally, a number of parameters in our work is smaller than the previous approaches.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"390 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115991217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

MSDC-Net: Multi-Scale Dense and Contextual Networks for Stereo Matching MSDC-Net:用于立体匹配的多尺度密集上下文网络

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2019-11-01 DOI: 10.1109/APSIPAASC47483.2019.9023237

Zhibo Rao, Mingyi He, Yuchao Dai, Zhidong Zhu, Bo Li, Renjie He

引用次数: 6