{"title":"GAN-Based Inter-Channel Amplitude Ratio Decoding in Multi-Channel Speech Coding","authors":"Jinru Zhu, C. Bao","doi":"10.1109/ISCSLP49672.2021.9362089","DOIUrl":null,"url":null,"abstract":"In this paper, a multi-channel speech coding method based on down-mixing and inter-channel amplitude ratio (ICAR) decoding based on generative adversarial network (GAN) is proposed. Firstly, spatial parameter inter-channel time difference (ICTD) is extracted. In the short-time Fourier transform (STFT) domain, the amplitude of the down-mixed mono signal is obtained by adding and averaging the amplitude of the multi-channel speech signals, the phase of the down-mixed mono signal is replaced by the phase of the reference channel, the STFT of the down-mixed mono signal is obtained. Then, the inverse STFT is used to obtain the down-mixed mono signal. The amplitude ratio between multichannel speech signals and down-mixed signal (ICAR) is extracted. The down-mixed mono signal is coded by Speex codec, and ICTD is quantized by a uniform scalar quantizer. The ICAR needn’t to be encoded. The ICAR is decoded from a well-trained GAN at the decoder based on the decoded mono signal. Finally, the decoded multi-channel speech signals are recovered by using the decoded down-mixed mono signal, decoded ICTD and the decoded ICAR. The experimental results show that the proposed multi-channel speech coding method can recover multi-channel speech signals with spatial information.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"165 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP49672.2021.9362089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, a multi-channel speech coding method based on down-mixing and inter-channel amplitude ratio (ICAR) decoding based on generative adversarial network (GAN) is proposed. Firstly, spatial parameter inter-channel time difference (ICTD) is extracted. In the short-time Fourier transform (STFT) domain, the amplitude of the down-mixed mono signal is obtained by adding and averaging the amplitude of the multi-channel speech signals, the phase of the down-mixed mono signal is replaced by the phase of the reference channel, the STFT of the down-mixed mono signal is obtained. Then, the inverse STFT is used to obtain the down-mixed mono signal. The amplitude ratio between multichannel speech signals and down-mixed signal (ICAR) is extracted. The down-mixed mono signal is coded by Speex codec, and ICTD is quantized by a uniform scalar quantizer. The ICAR needn’t to be encoded. The ICAR is decoded from a well-trained GAN at the decoder based on the decoded mono signal. Finally, the decoded multi-channel speech signals are recovered by using the decoded down-mixed mono signal, decoded ICTD and the decoded ICAR. The experimental results show that the proposed multi-channel speech coding method can recover multi-channel speech signals with spatial information.