使用独立子空间分析语音和音乐的时间尺度

2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04. Pub Date : 2004-12-11 DOI:10.1109/SPCOM.2004.1458408

R. Muralishankar, L. Kaushik, A. Ramakrishnan

{"title":"使用独立子空间分析语音和音乐的时间尺度","authors":"R. Muralishankar, L. Kaushik, A. Ramakrishnan","doi":"10.1109/SPCOM.2004.1458408","DOIUrl":null,"url":null,"abstract":"We propose a new technique for modifying the time-scale of speech and music using independent subspace analysis (ISA). To carry out ISA, the single channel mixture signal is converted to a time-frequency representation such as spectrogram. The spectrogram is generated by taking Hartley or wavelet transform on overlapped frames of speech or music. We do dimensionality reduction of the autocorrelated original spectrogram using singular value decomposition. Then, we use independent component analysis to get unmixing matrix using JadeICA algorithm. It is then assumed that the overall spectrogram results from the superposition of a number of unknown statistically independent spectrograms. By using unmixing matrix, independent sources such as temporal amplitude envelopes and frequency weights can be extracted from the spectrogram. Time-scaling of speech and music is carried out by resampling the independent temporal amplitude envelopes. We then multiply the independent frequency weights with time-scaled temporal amplitude envelopes. We Sum these independent spectrograms and take inverse Hartley or wavelet transform of the sum spectrogram. The reconstructed time-domain signal is overlap-added to get the time-scaled signal. The quality of the time-scaled speech and music has been analyzed using Modified Bark spectral distortion (MBSD). From the MBSD score, one can infer that the time-scaled signal is less distorted.","PeriodicalId":424981,"journal":{"name":"2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04.","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Time-scaling of speech and music using independent subspace analysis\",\"authors\":\"R. Muralishankar, L. Kaushik, A. Ramakrishnan\",\"doi\":\"10.1109/SPCOM.2004.1458408\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a new technique for modifying the time-scale of speech and music using independent subspace analysis (ISA). To carry out ISA, the single channel mixture signal is converted to a time-frequency representation such as spectrogram. The spectrogram is generated by taking Hartley or wavelet transform on overlapped frames of speech or music. We do dimensionality reduction of the autocorrelated original spectrogram using singular value decomposition. Then, we use independent component analysis to get unmixing matrix using JadeICA algorithm. It is then assumed that the overall spectrogram results from the superposition of a number of unknown statistically independent spectrograms. By using unmixing matrix, independent sources such as temporal amplitude envelopes and frequency weights can be extracted from the spectrogram. Time-scaling of speech and music is carried out by resampling the independent temporal amplitude envelopes. We then multiply the independent frequency weights with time-scaled temporal amplitude envelopes. We Sum these independent spectrograms and take inverse Hartley or wavelet transform of the sum spectrogram. The reconstructed time-domain signal is overlap-added to get the time-scaled signal. The quality of the time-scaled speech and music has been analyzed using Modified Bark spectral distortion (MBSD). From the MBSD score, one can infer that the time-scaled signal is less distorted.\",\"PeriodicalId\":424981,\"journal\":{\"name\":\"2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04.\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPCOM.2004.1458408\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPCOM.2004.1458408","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

我们提出了一种利用独立子空间分析(ISA)来修改语音和音乐时间尺度的新技术。为了实现ISA，将单通道混合信号转换为诸如频谱图之类的时频表示。频谱图是通过对语音或音乐的重叠帧进行哈特利变换或小波变换生成的。利用奇异值分解对自相关原始谱图进行降维处理。然后利用JadeICA算法进行独立分量分析得到解混矩阵。然后假定整个谱图是由若干未知的统计独立谱图的叠加而成。利用解混矩阵，可以从频谱图中提取出时间振幅包络和频率权重等独立源。语音和音乐的时间尺度是通过对独立的时间振幅包络进行重采样来实现的。然后，我们将独立的频率权重与时间尺度的时间振幅包络相乘。我们对这些独立的谱图求和，并对和谱图进行哈特利逆变换或小波变换。对重构后的时域信号进行叠加，得到时间尺度信号。利用改进的巴克谱失真(MBSD)对时间尺度语音和音乐的质量进行了分析。从MBSD分数可以推断，时间尺度信号失真较小。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Time-scaling of speech and music using independent subspace analysis

We propose a new technique for modifying the time-scale of speech and music using independent subspace analysis (ISA). To carry out ISA, the single channel mixture signal is converted to a time-frequency representation such as spectrogram. The spectrogram is generated by taking Hartley or wavelet transform on overlapped frames of speech or music. We do dimensionality reduction of the autocorrelated original spectrogram using singular value decomposition. Then, we use independent component analysis to get unmixing matrix using JadeICA algorithm. It is then assumed that the overall spectrogram results from the superposition of a number of unknown statistically independent spectrograms. By using unmixing matrix, independent sources such as temporal amplitude envelopes and frequency weights can be extracted from the spectrogram. Time-scaling of speech and music is carried out by resampling the independent temporal amplitude envelopes. We then multiply the independent frequency weights with time-scaled temporal amplitude envelopes. We Sum these independent spectrograms and take inverse Hartley or wavelet transform of the sum spectrogram. The reconstructed time-domain signal is overlap-added to get the time-scaled signal. The quality of the time-scaled speech and music has been analyzed using Modified Bark spectral distortion (MBSD). From the MBSD score, one can infer that the time-scaled signal is less distorted.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04.

自引率

0.00%

发文量