2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献_第9页

Direction-aware target speaker extraction with a dual-channel system based on conditional variational autoencoders under underdetermined conditions 欠确定条件下基于条件变分自编码器的双通道定向目标说话人提取

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9979881

Rui Wang, Li Li, T. Toda

{"title":"Direction-aware target speaker extraction with a dual-channel system based on conditional variational autoencoders under underdetermined conditions","authors":"Rui Wang, Li Li, T. Toda","doi":"10.23919/APSIPAASC55919.2022.9979881","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979881","url":null,"abstract":"In this paper, we deal with a dual-channel target speaker extraction (TSE) problem under underdetermined con-ditions. For the dual-channel system, the generalized sidelobe canceller (GSC) is a commonly used structure for estimating a blocking matrix (BM) to generate interference, and geometric source separation (GSS) can be used as an implementation of BM estimation utilizing directional information. However, the performance of the conventional GSS methods is limited under underdetermined conditions because of the lack of a powerful source model. In this paper, we propose a dual-channel TSE method that combines the ability of target selection based on geometric constraints, more powerful source modeling, and nonlinear postprocessing. The target directional information is used as a geometric constraint, and two conditional variational auto encoders (CVAEs) are used to model a single speaker's speech and interference mixture speech. For the postprocessing, an ideal ratio Time-Frequency (T-F) mask estimated from the separated interference mixture speech is used to extract the target speaker's speech. The experimental results demonstrate that the proposed method achieves 6.24 dB and 8.37 dB improvements compared with the baseline method in terms of signal-to-distortions ratio (SDR) and source-to-interferences ratio (SIR) respectively under strong reverberation for 470 ms.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116382658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

DCAN: Deep Consecutive Attention Network for Video Super Resolution DCAN:视频超分辨率深度连续注意网络

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9979823

Talha Saleem, Sovann Chen, S. Aramvith

{"title":"DCAN: Deep Consecutive Attention Network for Video Super Resolution","authors":"Talha Saleem, Sovann Chen, S. Aramvith","doi":"10.23919/APSIPAASC55919.2022.9979823","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979823","url":null,"abstract":"Slow motion is visually attractive in video applications and gets more attention in video super-resolution (VSR). To generate the high-resolution (HR) center frame with its neighbor HR frames from the low-resolution (LR) of two frames. Two sub-tasks are required, including video super-resolution (VSR) and video frame interpolation (VFI). However, the interpolation approach does not successfully extract low-level features to achieve the acceptable result of space-time video super-resolution. Therefore, the restoration performance of existing systems is constrained due to rarely considering the spatial-temporal correlation and the long-term temporal context concurrently. To this extent, we propose a deep consecutive attention network-based method to generate attentive features to get HR slow-motion frames. A channel attention module and an attentive temporal feature module are designed to improve the perceptual quality of predicted interpolation feature frames. The experimental results show the proposed method outperforms 0.17 dB in an average PSNR compared to the state-of-the-art baseline method.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115578252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Encrypted JPEG Image Retrieval via Huffman-code Based Self-Attention Networks 基于霍夫曼码的自关注网络加密JPEG图像检索

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9979814

Zhixun Lu, Qihua Feng, Peiya Li

{"title":"Encrypted JPEG Image Retrieval via Huffman-code Based Self-Attention Networks","authors":"Zhixun Lu, Qihua Feng, Peiya Li","doi":"10.23919/APSIPAASC55919.2022.9979814","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979814","url":null,"abstract":"Image retrieval has been widely used in daily life. In recent years, with the increasing awareness of privacy protection, encrypted image retrieval has also been gradually developed. In this paper, we propose a new encrypted JPEG image retrieval scheme, named Huffman-code Based Self-Attention Networks (HBSAN), which could conduct image retrieval and protect image privacy effectively. To be specific, we first extract Huffman-code histograms directly from cipher-images which are encrypted by jointly using new orthogonal transformation, permutation cipher and stream cipher during JPEG compression. Then we employ the self-attention neural networks to mine the deep relations and retrieve the cipher-images. In our retrieval model, we design a self-attention multi-layer perceptron module, called SAMLP, to effectively learn global dependencies within representations of cipher-images. Extensive experiments present our encryption algorithm is compression-friendly, ensures no information leakage, and HBSAN significantly outperforms other state-of-the-art models in retrieval performance.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115597106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-branch Learning for Noisy and Reverberant Monaural Speech Separation 噪声和混响单耳语音分离的多分支学习

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980244

Chao Ma, Dongmei Li

引用次数: 0

Multi-resolution GPR clutter suppression method based on low-rank and sparse decomposition 基于低秩稀疏分解的多分辨探地雷达杂波抑制方法

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980215

Yanjie Cao, Xiaopeng Yang, T. Lan

引用次数: 1

Fine-Tuning BERT for Question and Answering Using PubMed Abstract Dataset 基于PubMed摘要数据集的BERT问答优化

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980097

Saeyeon Cheon, Insung Ahn

引用次数: 1

Parameterization of Dominant Spectral Peak Trajectory for Whisper Speech Recognition 耳语语音识别的优势谱峰轨迹参数化

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980259

Chang Feng, Xiaolong Wu, Mingxing Xu, T. Zheng

{"title":"Parameterization of Dominant Spectral Peak Trajectory for Whisper Speech Recognition","authors":"Chang Feng, Xiaolong Wu, Mingxing Xu, T. Zheng","doi":"10.23919/APSIPAASC55919.2022.9980259","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980259","url":null,"abstract":"Automatic speech recognition (ASR) systems trained on normal speech generally suffer from performance degradations for whisper speech. To solve this problem, this paper concentrates on utilizing similar factors between normal and whisper speech to construct a whisper speech recognizer with normal speech data. We propose to parameterize the dominant spectral peak trajectory (Ppeak) to capture the similarities and concatenate it to the traditional Mel-Frequency Cepstral Coefficients (MFCC) and Human Factor Cepstral Coefficients (HFCC), respectively, to form new features. The proposed features benefit to the accuracy of whisper speech recognition. Performance improvement can be further achieved when the similarity is enhanced by removing low frequency information. Experimental results show that the performance degradation between match and mismatch scenarios was reduced relatively by 90.31% in Word Error Rate (WER) for HFCC after similarity enhancement at a cut-off frequency of 500Hz. Furthermore, we ultimately achieved a relative reduction of 69.60% in WER in the mismatch scenario compared with conventional MFCC even without whisper speech data for training.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114250363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Correlation Loss for MOS Prediction of Synthetic Speech 基于相关损失的MOS合成语音预测

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980182

Beibei Hu, Qiang Li

引用次数: 1

Restoring Edge and Color using Weighted Near-Infrared Image and Color Transmission Maps for Robust Haze Removal 使用加权近红外图像和彩色传输图恢复边缘和颜色，用于鲁棒去除雾霾

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9979960

Onhi Kato, Akira Kubota

引用次数: 0

Camera-Based Log System for Human Physical Distance Tracking in Classroom 基于摄像机的教室人体物理距离跟踪日志系统

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980055

S. Deepaisarn, Angkoon Angkoonsawaengsuk, Charn Arunkit, Chayud Srisumarnk, Krongkan Nimmanwatthana, Nanmanas Linphrachaya, Nattapol Chiewnawintawat, Rinrada Tanthanathewin, Sivakorn Seinglek, Suphachok Buaruk, Virach Sornlertlamvanich

引用次数: 0