使用独立子空间分析语音和音乐的时间尺度

R. Muralishankar, L. Kaushik, A. Ramakrishnan
{"title":"使用独立子空间分析语音和音乐的时间尺度","authors":"R. Muralishankar, L. Kaushik, A. Ramakrishnan","doi":"10.1109/SPCOM.2004.1458408","DOIUrl":null,"url":null,"abstract":"We propose a new technique for modifying the time-scale of speech and music using independent subspace analysis (ISA). To carry out ISA, the single channel mixture signal is converted to a time-frequency representation such as spectrogram. The spectrogram is generated by taking Hartley or wavelet transform on overlapped frames of speech or music. We do dimensionality reduction of the autocorrelated original spectrogram using singular value decomposition. Then, we use independent component analysis to get unmixing matrix using JadeICA algorithm. It is then assumed that the overall spectrogram results from the superposition of a number of unknown statistically independent spectrograms. By using unmixing matrix, independent sources such as temporal amplitude envelopes and frequency weights can be extracted from the spectrogram. Time-scaling of speech and music is carried out by resampling the independent temporal amplitude envelopes. We then multiply the independent frequency weights with time-scaled temporal amplitude envelopes. We Sum these independent spectrograms and take inverse Hartley or wavelet transform of the sum spectrogram. The reconstructed time-domain signal is overlap-added to get the time-scaled signal. The quality of the time-scaled speech and music has been analyzed using Modified Bark spectral distortion (MBSD). From the MBSD score, one can infer that the time-scaled signal is less distorted.","PeriodicalId":424981,"journal":{"name":"2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04.","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Time-scaling of speech and music using independent subspace analysis\",\"authors\":\"R. Muralishankar, L. Kaushik, A. Ramakrishnan\",\"doi\":\"10.1109/SPCOM.2004.1458408\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a new technique for modifying the time-scale of speech and music using independent subspace analysis (ISA). To carry out ISA, the single channel mixture signal is converted to a time-frequency representation such as spectrogram. The spectrogram is generated by taking Hartley or wavelet transform on overlapped frames of speech or music. We do dimensionality reduction of the autocorrelated original spectrogram using singular value decomposition. Then, we use independent component analysis to get unmixing matrix using JadeICA algorithm. It is then assumed that the overall spectrogram results from the superposition of a number of unknown statistically independent spectrograms. By using unmixing matrix, independent sources such as temporal amplitude envelopes and frequency weights can be extracted from the spectrogram. Time-scaling of speech and music is carried out by resampling the independent temporal amplitude envelopes. We then multiply the independent frequency weights with time-scaled temporal amplitude envelopes. We Sum these independent spectrograms and take inverse Hartley or wavelet transform of the sum spectrogram. The reconstructed time-domain signal is overlap-added to get the time-scaled signal. The quality of the time-scaled speech and music has been analyzed using Modified Bark spectral distortion (MBSD). From the MBSD score, one can infer that the time-scaled signal is less distorted.\",\"PeriodicalId\":424981,\"journal\":{\"name\":\"2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04.\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPCOM.2004.1458408\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPCOM.2004.1458408","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

我们提出了一种利用独立子空间分析(ISA)来修改语音和音乐时间尺度的新技术。为了实现ISA,将单通道混合信号转换为诸如频谱图之类的时频表示。频谱图是通过对语音或音乐的重叠帧进行哈特利变换或小波变换生成的。利用奇异值分解对自相关原始谱图进行降维处理。然后利用JadeICA算法进行独立分量分析得到解混矩阵。然后假定整个谱图是由若干未知的统计独立谱图的叠加而成。利用解混矩阵,可以从频谱图中提取出时间振幅包络和频率权重等独立源。语音和音乐的时间尺度是通过对独立的时间振幅包络进行重采样来实现的。然后,我们将独立的频率权重与时间尺度的时间振幅包络相乘。我们对这些独立的谱图求和,并对和谱图进行哈特利逆变换或小波变换。对重构后的时域信号进行叠加,得到时间尺度信号。利用改进的巴克谱失真(MBSD)对时间尺度语音和音乐的质量进行了分析。从MBSD分数可以推断,时间尺度信号失真较小。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Time-scaling of speech and music using independent subspace analysis
We propose a new technique for modifying the time-scale of speech and music using independent subspace analysis (ISA). To carry out ISA, the single channel mixture signal is converted to a time-frequency representation such as spectrogram. The spectrogram is generated by taking Hartley or wavelet transform on overlapped frames of speech or music. We do dimensionality reduction of the autocorrelated original spectrogram using singular value decomposition. Then, we use independent component analysis to get unmixing matrix using JadeICA algorithm. It is then assumed that the overall spectrogram results from the superposition of a number of unknown statistically independent spectrograms. By using unmixing matrix, independent sources such as temporal amplitude envelopes and frequency weights can be extracted from the spectrogram. Time-scaling of speech and music is carried out by resampling the independent temporal amplitude envelopes. We then multiply the independent frequency weights with time-scaled temporal amplitude envelopes. We Sum these independent spectrograms and take inverse Hartley or wavelet transform of the sum spectrogram. The reconstructed time-domain signal is overlap-added to get the time-scaled signal. The quality of the time-scaled speech and music has been analyzed using Modified Bark spectral distortion (MBSD). From the MBSD score, one can infer that the time-scaled signal is less distorted.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信