{"title":"使用独立子空间分析语音和音乐的时间尺度","authors":"R. Muralishankar, L. Kaushik, A. Ramakrishnan","doi":"10.1109/SPCOM.2004.1458408","DOIUrl":null,"url":null,"abstract":"We propose a new technique for modifying the time-scale of speech and music using independent subspace analysis (ISA). To carry out ISA, the single channel mixture signal is converted to a time-frequency representation such as spectrogram. The spectrogram is generated by taking Hartley or wavelet transform on overlapped frames of speech or music. We do dimensionality reduction of the autocorrelated original spectrogram using singular value decomposition. Then, we use independent component analysis to get unmixing matrix using JadeICA algorithm. It is then assumed that the overall spectrogram results from the superposition of a number of unknown statistically independent spectrograms. By using unmixing matrix, independent sources such as temporal amplitude envelopes and frequency weights can be extracted from the spectrogram. Time-scaling of speech and music is carried out by resampling the independent temporal amplitude envelopes. We then multiply the independent frequency weights with time-scaled temporal amplitude envelopes. We Sum these independent spectrograms and take inverse Hartley or wavelet transform of the sum spectrogram. The reconstructed time-domain signal is overlap-added to get the time-scaled signal. The quality of the time-scaled speech and music has been analyzed using Modified Bark spectral distortion (MBSD). From the MBSD score, one can infer that the time-scaled signal is less distorted.","PeriodicalId":424981,"journal":{"name":"2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04.","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Time-scaling of speech and music using independent subspace analysis\",\"authors\":\"R. Muralishankar, L. Kaushik, A. Ramakrishnan\",\"doi\":\"10.1109/SPCOM.2004.1458408\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a new technique for modifying the time-scale of speech and music using independent subspace analysis (ISA). To carry out ISA, the single channel mixture signal is converted to a time-frequency representation such as spectrogram. The spectrogram is generated by taking Hartley or wavelet transform on overlapped frames of speech or music. We do dimensionality reduction of the autocorrelated original spectrogram using singular value decomposition. Then, we use independent component analysis to get unmixing matrix using JadeICA algorithm. It is then assumed that the overall spectrogram results from the superposition of a number of unknown statistically independent spectrograms. By using unmixing matrix, independent sources such as temporal amplitude envelopes and frequency weights can be extracted from the spectrogram. Time-scaling of speech and music is carried out by resampling the independent temporal amplitude envelopes. We then multiply the independent frequency weights with time-scaled temporal amplitude envelopes. We Sum these independent spectrograms and take inverse Hartley or wavelet transform of the sum spectrogram. The reconstructed time-domain signal is overlap-added to get the time-scaled signal. The quality of the time-scaled speech and music has been analyzed using Modified Bark spectral distortion (MBSD). From the MBSD score, one can infer that the time-scaled signal is less distorted.\",\"PeriodicalId\":424981,\"journal\":{\"name\":\"2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04.\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPCOM.2004.1458408\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPCOM.2004.1458408","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Time-scaling of speech and music using independent subspace analysis
We propose a new technique for modifying the time-scale of speech and music using independent subspace analysis (ISA). To carry out ISA, the single channel mixture signal is converted to a time-frequency representation such as spectrogram. The spectrogram is generated by taking Hartley or wavelet transform on overlapped frames of speech or music. We do dimensionality reduction of the autocorrelated original spectrogram using singular value decomposition. Then, we use independent component analysis to get unmixing matrix using JadeICA algorithm. It is then assumed that the overall spectrogram results from the superposition of a number of unknown statistically independent spectrograms. By using unmixing matrix, independent sources such as temporal amplitude envelopes and frequency weights can be extracted from the spectrogram. Time-scaling of speech and music is carried out by resampling the independent temporal amplitude envelopes. We then multiply the independent frequency weights with time-scaled temporal amplitude envelopes. We Sum these independent spectrograms and take inverse Hartley or wavelet transform of the sum spectrogram. The reconstructed time-domain signal is overlap-added to get the time-scaled signal. The quality of the time-scaled speech and music has been analyzed using Modified Bark spectral distortion (MBSD). From the MBSD score, one can infer that the time-scaled signal is less distorted.