2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

筛选
英文 中文
Direction-aware target speaker extraction with a dual-channel system based on conditional variational autoencoders under underdetermined conditions 欠确定条件下基于条件变分自编码器的双通道定向目标说话人提取
Rui Wang, Li Li, T. Toda
{"title":"Direction-aware target speaker extraction with a dual-channel system based on conditional variational autoencoders under underdetermined conditions","authors":"Rui Wang, Li Li, T. Toda","doi":"10.23919/APSIPAASC55919.2022.9979881","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979881","url":null,"abstract":"In this paper, we deal with a dual-channel target speaker extraction (TSE) problem under underdetermined con-ditions. For the dual-channel system, the generalized sidelobe canceller (GSC) is a commonly used structure for estimating a blocking matrix (BM) to generate interference, and geometric source separation (GSS) can be used as an implementation of BM estimation utilizing directional information. However, the performance of the conventional GSS methods is limited under underdetermined conditions because of the lack of a powerful source model. In this paper, we propose a dual-channel TSE method that combines the ability of target selection based on geometric constraints, more powerful source modeling, and nonlinear postprocessing. The target directional information is used as a geometric constraint, and two conditional variational auto encoders (CVAEs) are used to model a single speaker's speech and interference mixture speech. For the postprocessing, an ideal ratio Time-Frequency (T-F) mask estimated from the separated interference mixture speech is used to extract the target speaker's speech. The experimental results demonstrate that the proposed method achieves 6.24 dB and 8.37 dB improvements compared with the baseline method in terms of signal-to-distortions ratio (SDR) and source-to-interferences ratio (SIR) respectively under strong reverberation for 470 ms.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116382658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
DCAN: Deep Consecutive Attention Network for Video Super Resolution DCAN:视频超分辨率深度连续注意网络
Talha Saleem, Sovann Chen, S. Aramvith
{"title":"DCAN: Deep Consecutive Attention Network for Video Super Resolution","authors":"Talha Saleem, Sovann Chen, S. Aramvith","doi":"10.23919/APSIPAASC55919.2022.9979823","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979823","url":null,"abstract":"Slow motion is visually attractive in video applications and gets more attention in video super-resolution (VSR). To generate the high-resolution (HR) center frame with its neighbor HR frames from the low-resolution (LR) of two frames. Two sub-tasks are required, including video super-resolution (VSR) and video frame interpolation (VFI). However, the interpolation approach does not successfully extract low-level features to achieve the acceptable result of space-time video super-resolution. Therefore, the restoration performance of existing systems is constrained due to rarely considering the spatial-temporal correlation and the long-term temporal context concurrently. To this extent, we propose a deep consecutive attention network-based method to generate attentive features to get HR slow-motion frames. A channel attention module and an attentive temporal feature module are designed to improve the perceptual quality of predicted interpolation feature frames. The experimental results show the proposed method outperforms 0.17 dB in an average PSNR compared to the state-of-the-art baseline method.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115578252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Encrypted JPEG Image Retrieval via Huffman-code Based Self-Attention Networks 基于霍夫曼码的自关注网络加密JPEG图像检索
Zhixun Lu, Qihua Feng, Peiya Li
{"title":"Encrypted JPEG Image Retrieval via Huffman-code Based Self-Attention Networks","authors":"Zhixun Lu, Qihua Feng, Peiya Li","doi":"10.23919/APSIPAASC55919.2022.9979814","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979814","url":null,"abstract":"Image retrieval has been widely used in daily life. In recent years, with the increasing awareness of privacy protection, encrypted image retrieval has also been gradually developed. In this paper, we propose a new encrypted JPEG image retrieval scheme, named Huffman-code Based Self-Attention Networks (HBSAN), which could conduct image retrieval and protect image privacy effectively. To be specific, we first extract Huffman-code histograms directly from cipher-images which are encrypted by jointly using new orthogonal transformation, permutation cipher and stream cipher during JPEG compression. Then we employ the self-attention neural networks to mine the deep relations and retrieve the cipher-images. In our retrieval model, we design a self-attention multi-layer perceptron module, called SAMLP, to effectively learn global dependencies within representations of cipher-images. Extensive experiments present our encryption algorithm is compression-friendly, ensures no information leakage, and HBSAN significantly outperforms other state-of-the-art models in retrieval performance.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115597106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-branch Learning for Noisy and Reverberant Monaural Speech Separation 噪声和混响单耳语音分离的多分支学习
Chao Ma, Dongmei Li
{"title":"Multi-branch Learning for Noisy and Reverberant Monaural Speech Separation","authors":"Chao Ma, Dongmei Li","doi":"10.23919/APSIPAASC55919.2022.9980244","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980244","url":null,"abstract":"With the rapid development of deep learning approaches, much progress has been made on speech enhancement, speech dereverberation, and monaural multi- speaker speech separation to solve the cocktail problem. Some excellent methods have been proposed to solve the monaural speech separation in a noisy and reverberant environment. However, few studies exploit the correlations between anechoic speech and reverberant speech. In this work, the structure of a popular separation system is deconstructed, and a multi-branch learning method is proposed to enforce the network to exploit the correlations between anechoic speech and the corresponding reverberant speech. The results show that using multi-branch learning can improve the separation performance of different networks by 0.7dB with conv-tasnet on the WHAMR! dataset.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"58 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114126699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-resolution GPR clutter suppression method based on low-rank and sparse decomposition 基于低秩稀疏分解的多分辨探地雷达杂波抑制方法
Yanjie Cao, Xiaopeng Yang, T. Lan
{"title":"Multi-resolution GPR clutter suppression method based on low-rank and sparse decomposition","authors":"Yanjie Cao, Xiaopeng Yang, T. Lan","doi":"10.23919/APSIPAASC55919.2022.9980215","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980215","url":null,"abstract":"The clutter encountered in ground-penetrating radar (GPR) seriously affects the detection and identification for the subsurface target, which has been widely studied in recent years. A low-rank and sparse decomposition (LRSD) method with multi-resolution is introduced in this paper. First, the raw GPR data is decomposed by stationary wavelet transform (SWT) to obtain different sub-bands. Then, the robust non-negative matrix factorization (RNMF) is used for approximation sub-bands and horizontal wavelet sub-bands to extract the target sparse parts. Next, the wavelet soft threshold de-noising is used for the vertical and diagonal wavelet sub-bands. Finally, the inverse wavelet transform of processed sub-bands is performed to reconstruct the target signal. The proposed method is compared with the subspace method and LRSD methods on both simulation data and real collected data. Visual and quantitative results show that the proposed method has better clutter suppression performance.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115224478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Fine-Tuning BERT for Question and Answering Using PubMed Abstract Dataset 基于PubMed摘要数据集的BERT问答优化
Saeyeon Cheon, Insung Ahn
{"title":"Fine-Tuning BERT for Question and Answering Using PubMed Abstract Dataset","authors":"Saeyeon Cheon, Insung Ahn","doi":"10.23919/APSIPAASC55919.2022.9980097","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980097","url":null,"abstract":"The coronavirus, which first originated in China in 2019, spread worldwide and eventually reached a pandemic situation. In the interest of many people, misinformation about the coronavirus has been pouring out on the Internet. We developed a Q&A processing technique by building a dataset based on the PubMed paper abstract for people to easily get the right information. We fine-tuned BioBERT among the BERT models that reached SOTA performance in the biomedical Q&A task. It answered questions about coronavirus with high accuracy. In the future, we will develop our technology that can handle Q&A not only in English but also in multiple languages. This work will contribute to helping people who speak different languages easily obtain correct information amidst confusing data.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"201 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116169326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Parameterization of Dominant Spectral Peak Trajectory for Whisper Speech Recognition 耳语语音识别的优势谱峰轨迹参数化
Chang Feng, Xiaolong Wu, Mingxing Xu, T. Zheng
{"title":"Parameterization of Dominant Spectral Peak Trajectory for Whisper Speech Recognition","authors":"Chang Feng, Xiaolong Wu, Mingxing Xu, T. Zheng","doi":"10.23919/APSIPAASC55919.2022.9980259","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980259","url":null,"abstract":"Automatic speech recognition (ASR) systems trained on normal speech generally suffer from performance degradations for whisper speech. To solve this problem, this paper concentrates on utilizing similar factors between normal and whisper speech to construct a whisper speech recognizer with normal speech data. We propose to parameterize the dominant spectral peak trajectory (Ppeak) to capture the similarities and concatenate it to the traditional Mel-Frequency Cepstral Coefficients (MFCC) and Human Factor Cepstral Coefficients (HFCC), respectively, to form new features. The proposed features benefit to the accuracy of whisper speech recognition. Performance improvement can be further achieved when the similarity is enhanced by removing low frequency information. Experimental results show that the performance degradation between match and mismatch scenarios was reduced relatively by 90.31% in Word Error Rate (WER) for HFCC after similarity enhancement at a cut-off frequency of 500Hz. Furthermore, we ultimately achieved a relative reduction of 69.60% in WER in the mismatch scenario compared with conventional MFCC even without whisper speech data for training.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114250363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Correlation Loss for MOS Prediction of Synthetic Speech 基于相关损失的MOS合成语音预测
Beibei Hu, Qiang Li
{"title":"Correlation Loss for MOS Prediction of Synthetic Speech","authors":"Beibei Hu, Qiang Li","doi":"10.23919/APSIPAASC55919.2022.9980182","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980182","url":null,"abstract":"For the speech mean opinion score (MOS) prediction task, many deep-learning-based methods are developed. Generally, system-level and utterance-level mean squared error (MSE), Linear Correlation Coefficient (LCC), Spearman Rank Correlation Coefficient (SRCC), and Kendall Tau Rank Correlation (KTAU) are leveraged as the evaluation metrics. However, we find that the objective functions for many MOS prediction networks are MAE or MSE based without an explicit correlation objective part. This paper investigates different correlation losses for voice MOS prediction networks. Based on the datasets and SSL-MOS baseline system provided by VoiceMOsChallenge 2022, we employ different auxiliary correlation losses to train the MOS prediction network. The experiment results show that the suggested auxiliary correlation losses increase the performance of the SSL-MOS network on the six correlation metrics. Compared with the two best-performing systems in the VoiceMOsChallenge 2022, our approach achieves close performance on the system-level correlation metrics with simpler system architecture.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114512378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Restoring Edge and Color using Weighted Near-Infrared Image and Color Transmission Maps for Robust Haze Removal 使用加权近红外图像和彩色传输图恢复边缘和颜色,用于鲁棒去除雾霾
Onhi Kato, Akira Kubota
{"title":"Restoring Edge and Color using Weighted Near-Infrared Image and Color Transmission Maps for Robust Haze Removal","authors":"Onhi Kato, Akira Kubota","doi":"10.23919/APSIPAASC55919.2022.9979960","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979960","url":null,"abstract":"In recent years, various haze removal methods based on atmospheric scattering models have been proposed. Most methods target strong haze images in which light is scattered equally in all color channels. This paper proposes a haze removal method using near-infrared (NIR) images for weak haze images. The proposed method first restores the edges of color images by fusing weighted NIR images. Second, it estimates transmission maps for all color channels based on a wavelength-dependent scattering model and restores the color of the edge-restored image using the estimated transmission maps. Finally, the edge- restored and color-restored images are blended. Qualitative and quantitative evaluations demonstrate that the proposed method can restore edges and colors more naturally in weak haze images than conventional methods.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114604464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Camera-Based Log System for Human Physical Distance Tracking in Classroom 基于摄像机的教室人体物理距离跟踪日志系统
S. Deepaisarn, Angkoon Angkoonsawaengsuk, Charn Arunkit, Chayud Srisumarnk, Krongkan Nimmanwatthana, Nanmanas Linphrachaya, Nattapol Chiewnawintawat, Rinrada Tanthanathewin, Sivakorn Seinglek, Suphachok Buaruk, Virach Sornlertlamvanich
{"title":"Camera-Based Log System for Human Physical Distance Tracking in Classroom","authors":"S. Deepaisarn, Angkoon Angkoonsawaengsuk, Charn Arunkit, Chayud Srisumarnk, Krongkan Nimmanwatthana, Nanmanas Linphrachaya, Nattapol Chiewnawintawat, Rinrada Tanthanathewin, Sivakorn Seinglek, Suphachok Buaruk, Virach Sornlertlamvanich","doi":"10.23919/APSIPAASC55919.2022.9980055","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980055","url":null,"abstract":"In the pandemic of COVID-19, the indoor physical distancing protocol has been one of the recommendations for people to avoid close contact with each other in order to prevent contagious clusters. This paper proposes an end-to-end camera-based human physical distancing recording system for an indoor environment, specifically, a classroom. The recording system aims to automatically trace the locations of persons and the directions of their movements in a classroom, also with respect to the on- and off-seat activities. No identity of persons is kept in the recording log system, but locations of individual persons at each timestamp are obtained; hence, the spatial and temporal distribution can be studied further. In this paper, we illustrate the overview workflow of the human and seat detection as well as the log system storing human physical distancing actions.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114899944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信