2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

筛选
英文 中文
Geometric Discriminant Analysis for I-vector Based Speaker Verification 基于i向量的说话人验证的几何判别分析
Can Xu, Xianhong Chen, Liang He, Jia Liu
{"title":"Geometric Discriminant Analysis for I-vector Based Speaker Verification","authors":"Can Xu, Xianhong Chen, Liang He, Jia Liu","doi":"10.1109/APSIPAASC47483.2019.9023338","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023338","url":null,"abstract":"Many i-vector based speaker verification use linear discriminant analysis (LDA) as a post-processing stage. LDA maximizes the arithmetic mean of the Kullback-Leibler (KL) divergences between different pairs of speakers. However, for speaker verification, speakers with small divergence are easily misjudged. LDA is not optimal because it does not emphasize on enlarging small divergences. In addition, LDA makes an assumption that the i-vectors of different speakers are well modeled by Gaussian distributions with identical class covariance. Actually, the distributions of different speakers can have different covariances. Motivated by these observations, we explore speaker verification with geometric discriminant analysis (GDA), which uses geometric mean instead of arithmetic mean when maximizing the KL divergences. It puts more emphasis on enlarging small divergences. Furthermore, we study the heteroscedastic extension of GDA (HGDA), taking different covariances into consideration. Experiments on i-vector machine learning challenge indicate that, when the number of training speakers becomes smaller, the relative performance improvement of GDA and HGDA compared with LDA becomes larger. GDA and HGDA are better choices especially when training data is limited.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134589632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-quality waveform generator from fundamental frequency, spectral envelope, and band aperiodicity 高质量的波形发生器从基频,频谱包络,和频带非周期性
M. Morise, Takuro Shono
{"title":"High-quality waveform generator from fundamental frequency, spectral envelope, and band aperiodicity","authors":"M. Morise, Takuro Shono","doi":"10.1109/APSIPAASC47483.2019.9023206","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023206","url":null,"abstract":"This paper introduces a waveform generation algorithm from three speech parameters (fundamental frequency fo, spectral envelope, and band aperiodicity). The conventional speech analysis/synthesis system based on a vocoder mainly has a waveform generator based on pitch synchronous overlap and add (PSOLA). Since it uses the fast Fourier transform (FFT) to generate the vocal cord vibration, the processing speed is proportional to the fo. The algorithm also uses the spectral representation of the aperiodicity, whereas the band aperiodicity is mainly used in speech synthesis applications such as statistical parametric speech synthesis. We propose a waveform generation algorithm that reduces the computational cost and memory usage without degrading the synthesized speech. The algorithm utilizes excitation signal generation by directly using the band aperiodicity. The computational cost in a certain period is fixed because the excitation signal is filtered and processed by the overlap-add (OLA) algorithm. We used the re-synthesized speech to perform two evaluations for the processing speed and sound quality. The results showed that the sound quality of speech synthesized was almost the same by our proposed algorithm as by the conventional algorithm. The proposed algorithm can also reduce computational cost and memory usage.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133431818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Improved Retinex low-illumination image enhancement algorithm 一种改进的Retinex低照度图像增强算法
Shao-Chuan Wang, D. Gao, Yangping Wang, Song Wang
{"title":"An Improved Retinex low-illumination image enhancement algorithm","authors":"Shao-Chuan Wang, D. Gao, Yangping Wang, Song Wang","doi":"10.1109/APSIPAASC47483.2019.9023017","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023017","url":null,"abstract":"Low-illumination images are generally low-quality images. The retinex algorithm can cause halo artifacts and loss of details in processing. Therefore, an improved Retinex algorithm is proposed. Firstly, the HSI color space which is more in line with the human visual characteristics is selected instead of the RGB image, that is, the luminance component I is processed. Then, the illuminance image is estimated by using a guided filter that fuses the edge detection operator, and the edge detection operator can be better positioned. At the edge, an illuminance image with rich edge information can be obtained; after obtaining the illuminance image, the reflected image can be obtained by the Retinex principle, the obtained reflected image is subjected to low-rank decomposition, and the low-rank property of the image is used to suppress the enlarged halo and the enhancement process. Noise; finally, the visual effect is further improved by local contrast enhancement. Experiments show that the algorithm can effectively improve the brightness and contrast of the image, preserve the details of the image, and also suppress the noise interference in the enhancement process. The subjective visual effect and objective evaluation results of the image have also been greatly improved.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133853842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Modeling Multi-source Information Diffusion: A Graphical Evolutionary Game Approach 多源信息扩散建模:一种图形进化博弈方法
Hong Hu, Yuejiang Li, H. Zhao, Yan Chen
{"title":"Modeling Multi-source Information Diffusion: A Graphical Evolutionary Game Approach","authors":"Hong Hu, Yuejiang Li, H. Zhao, Yan Chen","doi":"10.1109/APSIPAASC47483.2019.9023248","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023248","url":null,"abstract":"Modeling of information diffusion over social networks is of crucial importance to better understand how the avalanche of information overflow affects our social life and economy, thus preventing the detrimental consequences caused by rumors and motivating some beneficial information spreading. However, most model-based works on information diffusion either consider the spreading of one single message or assume different diffusion processes are independent of each other. In real-world scenarios, multi-source correlated information often spreads together, which jointly influences users' decisions. In this paper, we model the multi-source information diffusion process from a graphical evolutionary game perspective. Specifically, we model users' local interactions and strategic decision making, and analyze the evolutionary dynamics of the diffusion processes of correlated information, aiming to investigate the underlying principles dominating the complex multi-source information diffusion. Simulation results on synthetic and Facebook networks are consistent with our theoretical analysis. We also test our proposed model on Weibo user forwarding data and observe a good prediction performance on real-world information spreading process, which demonstrates the effectiveness of the proposed approach.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133931863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Domain Adaptation Neural Network for Acoustic Scene Classification in Mismatched Conditions 不匹配条件下声学场景分类的领域自适应神经网络
Rui Wang, Mou Wang, Xiao-Lei Zhang, S. Rahardja
{"title":"Domain Adaptation Neural Network for Acoustic Scene Classification in Mismatched Conditions","authors":"Rui Wang, Mou Wang, Xiao-Lei Zhang, S. Rahardja","doi":"10.1109/APSIPAASC47483.2019.9023057","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023057","url":null,"abstract":"Acoustic scene classification is a task of predicting the acoustic environment of an audio recording. Because the training and test conditions in most real world acoustic scene classification problems do not match, it is strongly necessary to develop domain adaptation methods to solve the cross-domain problem. In this paper, we propose a domain adaptation neural network (DANN) based acoustic scene classification (ASC) method. Specifically, we first extract an acoustic feature, i.e. log-Mel spectrogram, which has been proven to be effective in previous studies. Then, we train a DANN to project the training and test domains into one common space where the acoustic scenes are categorized jointly. To boost the overall performance of the proposed method, we further train an ensemble of convolutional neural network (CNN) models with different parameter settings respectively. Finally, we fuse the DANN and CNN models by averaging the outputs of the models. We have evaluated the proposed method on the subtask B of task 1 of the DCASE 2019 ASC challenge, which is a closed-set classification problem whose audio recordings were recorded by mismatched devices. Experimental results demonstrate the effectiveness of the proposed method on the acoustic scene classification problem in mismatched conditions.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123001956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Monaural Singing Voice Separation Using Fusion-Net with Time-Frequency Masking 基于时频掩蔽融合网的单耳歌声分离
Feng Li, Kaizhi Qian, M. Hasegawa-Johnson, M. Akagi
{"title":"Monaural Singing Voice Separation Using Fusion-Net with Time-Frequency Masking","authors":"Feng Li, Kaizhi Qian, M. Hasegawa-Johnson, M. Akagi","doi":"10.1109/APSIPAASC47483.2019.9023055","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023055","url":null,"abstract":"Monaural singing voice separation has received much attention in recent years. In this paper, we propose a novel neural network architecture for monaural singing voice separation, Fusion-Net, which is combining U-Net with the residual convolutional neural network to develop a much deeper neural network architecture with summation-based skip connections. In addition, we apply time-frequency masking to improve the separation results. Finally, we integrate the phase spectra with magnitude spectra as the post-processing to optimize the separated singing voice from the mixture music. Experimental results demonstrate that the proposed method can achieve better separation performance than the previous U-Net architecture on the ccMixter database.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127860015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Joint Sparse Channel Estimation in Downlink NOMA System 下行NOMA系统中的联合稀疏信道估计
Haohui Jia, Na Chen, T. Higashino, M. Okada
{"title":"Joint Sparse Channel Estimation in Downlink NOMA System","authors":"Haohui Jia, Na Chen, T. Higashino, M. Okada","doi":"10.1109/APSIPAASC47483.2019.9023326","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023326","url":null,"abstract":"Non-orthogonal multiple access (NOMA) is regarded as one of the most important techniques for future 5G systems. In the downlink general NOMA schemes, the received NOMA signal will be analyzed via two parallel channel state information (CSI) after sparse multiple path channel fading. In this paper, by exploiting the inherent sparsity of the channel, we proposed a low-complexity joint channel estimation in a single-input and multiple-output antennas system, based on the compressed sensing to detect each layer channel state information. As a comparison, the performance of compressed sensing is better than the conventional method Least-Square (LS) and Minimum Mean Square Error (MMSE).","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128622510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An RGB Gait Anonymization Model for Low-Quality Silhouettes 低质量轮廓的RGB步态匿名化模型
Ngoc-Dung T. Tieu, H. Nguyen, Fuming Fang, J. Yamagishi, I. Echizen
{"title":"An RGB Gait Anonymization Model for Low-Quality Silhouettes","authors":"Ngoc-Dung T. Tieu, H. Nguyen, Fuming Fang, J. Yamagishi, I. Echizen","doi":"10.1109/APSIPAASC47483.2019.9023188","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023188","url":null,"abstract":"Gait anonymization while maintaining naturalness is used for protecting a person's identity against gait recognition systems when a video of the person walking is uploaded to social media. There has been some research on gait anonymization, but only for high-quality silhouette gaits. We present an RGB gait anonymization model for low-quality silhouette gaits that can generate natural, seamless anonymized gaits for which the original silhouettes cannot be extracted correctly. Our model includes two main networks. The first one, a deep convolutional generative adversarial network, is used to anonymize the original gait by adding to it a random noise vector. By training on high-quality silhouette data, this network can generate a high-quality anonymized silhouette sequence from a low-quality silhouette one. Restricting its input to binary silhouette sequences instead of color gaits forces it to focus on anonymizing the gait rather than changing body color. The second main network, which follows the first one, colorizes the anonymized silhouette sequence generated by the first network by using the color of the original gait. Evaluation in terms of success rate and naturalness demonstrated that our model can anonymize gaits while maintaining naturalness.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128854346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Convolutional Attention Model for Retinal Edema Segmentation 视网膜水肿分割的卷积注意模型
Phuong Le Thi, Tuan D. Pham, Jia-Ching Wang
{"title":"Convolutional Attention Model for Retinal Edema Segmentation","authors":"Phuong Le Thi, Tuan D. Pham, Jia-Ching Wang","doi":"10.1109/APSIPAASC47483.2019.9023282","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023282","url":null,"abstract":"Deep learning and computer vision that become popular in recent years are advantage techniques in medical diagnosis. A large database of Optical Coherence Tomography (OCT) images can be used to train a deep learning model which can support and suggest effectively illnesses and status of a patient. Therefore, semantic image segmentation is used to detect and categorize anomaly regions in OCT images. However, numerous existing approaches ignored spatial structure as well as contextual information in a given image. To overcome existing problems, this work proposes a novel method which takes advantage of the deep convolutional neural network, attention block, pyramid pooling module and auxiliary connections between layers. Attention block helps to detect the spatial structure of a given image. Beside, pyramid pooling module has a responsibility to identify the shape and margin of the anomaly region. In additional, auxiliary connections support to enrich useful information pass through one layer as well as reduce overfitting problem. Our work produces higher accuracy than state-of-the-art methods with 78.19% comparing to Deeplab_ v3 76.19% and Bisenet 76.85% in term of dice coefficient. Additionally, a number of parameters in our work is smaller than the previous approaches.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"390 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115991217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
MSDC-Net: Multi-Scale Dense and Contextual Networks for Stereo Matching MSDC-Net:用于立体匹配的多尺度密集上下文网络
Zhibo Rao, Mingyi He, Yuchao Dai, Zhidong Zhu, Bo Li, Renjie He
{"title":"MSDC-Net: Multi-Scale Dense and Contextual Networks for Stereo Matching","authors":"Zhibo Rao, Mingyi He, Yuchao Dai, Zhidong Zhu, Bo Li, Renjie He","doi":"10.1109/APSIPAASC47483.2019.9023237","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023237","url":null,"abstract":"Disparity prediction from stereo images is essential to computer vision applications such as autonomous driving, 3D model reconstruction, and object detection. To more accurately predict disparity map, a novel deep learning architecture (called MSDC-Net) for detecting the disparity map from a rectified pair of stereo images is proposed. Our MSDC-Net contains two modules: the multi-scale fusion 2D convolution module and the multi-scale residual 3D convolution module. The multi-scale fusion 2D convolution module exploits the potential multi-scale features, which extracts and fuses the different scale features by Dense-Net. The multi-scale residual 3D convolution module learns the different scale geometry context from the cost volume which aggregated by the multi-scale fusion 2D convolution module. Experimental results on Scene Flow and KITTI datasets demonstrate that our MSDC-Net significantly outperforms other approaches in the non-occluded region.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115994841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信