Eiji Ninomiya, M. Yukawa, Renato L. G. Cavalcante, Lorenzo Miretti
{"title":"Estimation of Angular Power Spectrum Using Multikernel Adaptive Filtering","authors":"Eiji Ninomiya, M. Yukawa, Renato L. G. Cavalcante, Lorenzo Miretti","doi":"10.23919/APSIPAASC55919.2022.9980067","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980067","url":null,"abstract":"This paper addresses the problem of estimating the angular power spectrum (APS) of massive multiple input multiple output wireless channels. Estimating the APS is useful, for instance, for simplifying the downlink channel estimation problem in frequency division duplex systems. We propose an efficient online algorithm that estimates the APS from the channel spatial covariance matrix. The proposed algorithm approximates the APS as a sum of Gaussian functions and leverages the framework of multikernel adaptive filtering.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131065022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"C-CycleTransGAN: A Non-parallel Controllable Cross-gender Voice Conversion Model with CycleGAN and Transformer","authors":"Changzeng Fu, Chaoran Liu, C. Ishi, H. Ishiguro","doi":"10.23919/APSIPAASC55919.2022.9979821","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979821","url":null,"abstract":"In this study, we propose a conversion intensity controllable model for the cross-gender voice conversion (VC)11Demo page can be found at https://cz26.github.io/DemoPage-c-CycleTransGAN-VoiceConversion/. In particular, we combine the CycleGAN and transformer module, and build a condition embedding network as an intensity controller. The model is firstly pre-trained with self-supervised learning on the single-gender voice reconstruction task, with the condition set to male-to-male or female-to-female. Then, we fine-tune the model on the cross-gender voice conversion task after the pretraining is completed, with the condition set to male-to-female or female-to-male. In the testing procedure, the condition is expected to be employed as a controllable parameter (scale) to adjust the conversion intensity. The proposed method was evaluated on the Voice Conversion Challenge dataset and compared to two baselines (CycleGAN, CycleTransGAN) with objective and subjective evaluations. The results show that our proposed model is able to equip the model with an additional function of cross-gender controllability and without hurting the voice conversion performance.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131218551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Highly Robust Action Retrieval using View-invariant Pose Feature and Simple yet Effective Query Expansion Method","authors":"Noboru Yoshida, Jianquan Liu","doi":"10.23919/APSIPAASC55919.2022.9979865","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979865","url":null,"abstract":"Action retrieval and detection utilizing view-invariant pose based feature achieve high precision. However the technology has a problem of low recall because of the large individual differences in action. Query-expansion(QE) methods are well known as effective ways to improve recall in object detection and retrieval task, but few research adapt it to the action retrieval task. We focused on the query expansion method and proposed new query generation method in which two queries containing missing points complement each other's missing points to perform high-recall action retrieval. The experimental results are reported to show that our method outperforms the state-of-the-art methods in a simulated dataset with annotated multi-view 2D poses and a real-world video dataset.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131123326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Neural Network Based Watermarking Trained with Quantized Activation Function","authors":"Shingo Yamauchi, Masaki Kawamura","doi":"10.23919/APSIPAASC55919.2022.9980204","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980204","url":null,"abstract":"We propose a watermarking method that incor-porates a quantized activation function to provide robustness against quantization. Zhu et al. showed that the introduction of a noise layer between the encoder and decoder can increase the robustness against attacks. Although there are various attacks on stego-images, these images are often JPEG-compressed. As the process of JPEG compression includes quantization, the watermark decoder must be able to estimate watermarks from compressed images. Hence, we propose a quantization layer that introduces a quantized activation function consisting of the hy-perbolic tangent function. The proposed neural network is based on that proposed by Hamamoto and Kawamura. By simulating the quantization of JPEG compression, the quantization layer is expected to improve the robustness against JPEG compression. The robustness was evaluated by the bit error rate (BER), and the stego-image quality was evaluated by the peak signal-to-noise ratio (PSNR). The proposed network achieved a high image quality of more than 35 dB, and it could extract watermarks with a BER of less than 0.1 for Q-values of 30 or higher in JPEG compression. It was thus more robust against JPEG compression than Hamamoto and Kawamura's model.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133546980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jae-Yeol Im, Soonbeom Choi, Sangeon Yong, Juhan Nam
{"title":"Neural Vocoder Feature Estimation for Dry Singing Voice Separation","authors":"Jae-Yeol Im, Soonbeom Choi, Sangeon Yong, Juhan Nam","doi":"10.23919/APSIPAASC55919.2022.9980093","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980093","url":null,"abstract":"Singing voice separation (SVS) is a task that separates singing voice audio from its mixture with instrumental audio. Previous SVS studies have mainly employed the spectrogram masking method which requires a large dimensionality in predicting the binary masks. In addition, they focused on extracting a vocal stem that retains the wet sound with the reverberation effect. This result may hinder the reusability of the isolated singing voice. This paper addresses the issues by predicting mel-spectrogram of dry singing voices from the mixed audio as neural vocoder features and synthesizing the singing voice waveforms from the neural vocoder. We experimented with two separation methods. One is predicting binary masks in the mel-spectrogram domain and the other is directly predicting the mel-spectrogram. Furthermore, we add a singing voice detector to identify the singing voice segments over time more explicitly. We measured the model performance in terms of audio, dereverberation, separation, and overall quality. The results show that our proposed model outperforms state-of-the-art singing voice separation models in both objective and subjective evaluation except the audio quality.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132186695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Syllable Analysis Data Augmentation for Khmer Ancient Palm leaf Recognition","authors":"Nimol Thuon, Jun Du, Jianshu Zhang","doi":"10.23919/APSIPAASC55919.2022.9980217","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980217","url":null,"abstract":"The unique forms and physical conditions of the Khmer palm leaf manuscript recognition system are receiving more attention from researchers. In the state-of-the-art, data augmentation is commonly used for data training; however, grammatical mistakes and data availability in the training process would determine or limit the accuracy rate. The two significant challenges lie in (1) grammar complexity and (2) wording similarity; therefore, this paper presents the Syllable Analysis Data Augmentation (SADA) technique, which aims at boosting the accuracy of the text recognition system for one of Southeast Asia's historical manuscripts from Cambodia. SADA comprises two fundamental modules: (1) formulating a collection of syllables/words to structure glyph patterns and (2) generating patterns from existing data through augmentation techniques and utilizing flexible geometric image transformation to increase similar word/text images. Initially, image collections are established, whereby datasets are interpreted according to the reordered grammatical structures to construct multiple glyph images. Next, we aim at conducting the experiment with a text/word recognition system before regulating attention-based encoder-decoder to enhance the probability of transcriptions of low and high-resolution images. At last, the experiment centers on datasets from various sources, including public datasets from ICFHR 2018 contest and our new augmentation datasets, all of which aim at demonstrating and evaluating the accuracy of the findings.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115420373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DBR: A Depth-Branch-Resorting Algorithm for Locality Exploration in Graph Processing","authors":"Lin Jiang, Ru Feng, Junjie Wang, Junyong Deng","doi":"10.23919/APSIPAASC55919.2022.9980127","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980127","url":null,"abstract":"Unstructured and irregular graph data causes strong randomness and poor locality of data access in graph processing. In order to alleviate this problem, this paper proposes a Depth-Branch-Resorting (DBR) Algorithm for locality exploration in graph processing, and the corresponding graph data compression format DBR_DCSR. The DBR algorithm and DBR_DCSR format are tested and verified on the framework GraphBIG. The results show that in terms of execution time, the DBR algorithm and DBR_DCSR format reduce GraphBIG execution time by 55.6% compared with the original GraphBIG framework, and 71.7%, 11.46% less than the frameworks of Ligra, Gemini respectively. While compared with the original GraphBIG framework, the optimized GraphBIG framework in DBR_DCSR format has a maximum reduction of 87.9% in data movement and 52.3% in data computation. Compared to the Ligra, Genimi, the amount of data movement are reduced by 33.5% and 49.7%, the amount of data calculation reduced by 54.3% and 43.9% respectively.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"337 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116446388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nonlinear Residual Echo Suppression Based on Gated Dual Signal Transformation LSTM Network","authors":"Kai Xie, Ziye Yang, Jie Chen","doi":"10.23919/APSIPAASC55919.2022.9980060","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980060","url":null,"abstract":"Although adaptive filters play a vital role in the acoustic echo cancellation system, multiple factors prevent them from completely eliminating the echo signal. Consequently, additional suppression module is required and crucial for enhancing the echo cancellation performance. In this work, we propose a gated dual signal transformation LSTM network (Gated DTLN) that improves upon the recently developed Dual Signal Trans-formation LSTM Network for AEC (DTLN-aec). The gated convolution units are inserted to enhance filtering features in the time domain part of the model, while the echo reference signal is removed from the input of this part to reduce the complexity of the mask generator. The experimental results on different signal-to-echo ratio (SER) datasets demonstrate the superiority of our proposed method.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116852339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gilles Van De Vyver, Zhaoyi Liu, Koustabh Dolui, D. Hughes, Sam Michiels
{"title":"Adapted Spectrogram Transformer for Unsupervised Cross-Domain Acoustic Anomaly Detection","authors":"Gilles Van De Vyver, Zhaoyi Liu, Koustabh Dolui, D. Hughes, Sam Michiels","doi":"10.23919/APSIPAASC55919.2022.9980266","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980266","url":null,"abstract":"Anomaly detection models can help to automatically and proactively detect faults in industrial machines. Microphones are appealing as they are generally inexpensive and unlike visual inspection, recording sound samples can give information about the internals of the machine. However, conventional methods based on an AutoEncoder (AE) structure learned from scratch generally struggle to learn how to robustly reconstruct samples with limited available data. This paper addresses this problem by presenting a method for unsupervised Acoustic Anomaly Detection (AAD) that adapts intermediate embeddings from a pretrained, self-attention-based spectrogram transformer. Transfer learning from a large, successful model offers a solution to learning with limited data by reusing external knowledge. For AAD, this can help to recognize subtle anomalies. This work proposes two method variants that take advantage of Intermediate Feature Embeddings (IFEs) from the Audio Spectrogram Transformer (AST). The first fits a Gaussian Mixture Model (GMM) on the IFEs produced by intermediate layers of the AST. We call this ADIFAST: Anomaly Detection from Intermediate Features extracted from AST. The second uses the IFEs in a different, more effective way by adapting the AST to an AE structure. We call it TELD: Transformer Encoder Linear Decoder network. The relationship between the two method variants is that they both take advantage of the IFEs extracted by the AST. Evaluating TELD on task 2 of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2021 challenge gives an average improvement to the Area Under Curve (AUC) score of 3.9% for binary labeling normal and anomalous samples in the target domain.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123919003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kenichi Arai, A. Ogawa, S. Araki, K. Kinoshita, T. Nakatani, Naoyuki Kamo, T. Irino
{"title":"Intelligibility prediction of enhanced speech using recognition accuracy of end-to-end ASR systems","authors":"Kenichi Arai, A. Ogawa, S. Araki, K. Kinoshita, T. Nakatani, Naoyuki Kamo, T. Irino","doi":"10.23919/APSIPAASC55919.2022.9980257","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980257","url":null,"abstract":"We propose speech intelligibility (SI) prediction methods using the recognition accuracy of an end-to-end (E2E) automatic speech recognition (ASR) system whose ASR performance has become comparable to the human auditory system due to its recent significant progress. Such predictors will fuel the development of speech enhancement methods for human listeners. In this paper, we evaluate our proposed method's prediction performance of the intelligibility of enhanced noisy speech signals. Our experiments show that when ASR systems are trained with various noisy speech data, our proposed methods, which do not require clean reference signals, predict SI more accurately than the existing “intrusive” methods: short-time objective intelligibility (STOI), extended-STOI (eSTOI), and our previously proposed methods, which were based on deep neural network-hidden Markov model hybrid ASR systems. Our experiments also show that our method, which additionally uses clean speech for determining the speech region of evaluation signals, further improves the prediction accuracy more than the existing methods.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"280 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124487764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}