点云音频处理

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Pub Date : 2021-05-06 DOI:10.1109/WASPAA52581.2021.9632668

K. Subramani, P. Smaragdis

{"title":"点云音频处理","authors":"K. Subramani, P. Smaragdis","doi":"10.1109/WASPAA52581.2021.9632668","DOIUrl":null,"url":null,"abstract":"Most audio processing pipelines involve transformations that act on fixed-dimensional input representations of audio. For example, when using the Short Time Fourier Transform (STFT) the DFT size specifies a fixed dimension for the input representation. As a consequence, most audio machine learning models are designed to process fixed-size vector inputs which often prohibits the repurposing of learned models on audio with different sampling rates or alternative representations. We note, however, that the intrinsic spectral information in the audio signal is invariant to the choice of the input representation or the sampling rate. Motivated by this, we introduce a novel way of processing audio signals by treating them as a collection of points in feature space, and we use point cloud machine learning models that give us invariance to the choice of representation parameters, such as DFT size or the sampling rate. Additionally, we observe that these methods result in smaller models, and allow us to significantly subsample the input representation with minimal effects to a trained model performance.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Point Cloud Audio Processing\",\"authors\":\"K. Subramani, P. Smaragdis\",\"doi\":\"10.1109/WASPAA52581.2021.9632668\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most audio processing pipelines involve transformations that act on fixed-dimensional input representations of audio. For example, when using the Short Time Fourier Transform (STFT) the DFT size specifies a fixed dimension for the input representation. As a consequence, most audio machine learning models are designed to process fixed-size vector inputs which often prohibits the repurposing of learned models on audio with different sampling rates or alternative representations. We note, however, that the intrinsic spectral information in the audio signal is invariant to the choice of the input representation or the sampling rate. Motivated by this, we introduce a novel way of processing audio signals by treating them as a collection of points in feature space, and we use point cloud machine learning models that give us invariance to the choice of representation parameters, such as DFT size or the sampling rate. Additionally, we observe that these methods result in smaller models, and allow us to significantly subsample the input representation with minimal effects to a trained model performance.\",\"PeriodicalId\":429900,\"journal\":{\"name\":\"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)\",\"volume\":\"62 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WASPAA52581.2021.9632668\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WASPAA52581.2021.9632668","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

大多数音频处理管道都涉及对音频的固定维输入表示进行转换。例如，当使用短时傅里叶变换(STFT)时，DFT大小为输入表示指定了固定的维度。因此，大多数音频机器学习模型被设计为处理固定大小的向量输入，这通常禁止在具有不同采样率或替代表示的音频上重新利用学习模型。然而，我们注意到音频信号中的固有频谱信息与输入表示或采样率的选择是不变的。受此启发，我们引入了一种新的处理音频信号的方法，将它们视为特征空间中的点的集合，并且我们使用点云机器学习模型，该模型为我们提供了表征参数选择的不变性，例如DFT大小或采样率。此外，我们观察到这些方法产生更小的模型，并允许我们在对训练模型性能影响最小的情况下显著地对输入表示进行子采样。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Point Cloud Audio Processing

Most audio processing pipelines involve transformations that act on fixed-dimensional input representations of audio. For example, when using the Short Time Fourier Transform (STFT) the DFT size specifies a fixed dimension for the input representation. As a consequence, most audio machine learning models are designed to process fixed-size vector inputs which often prohibits the repurposing of learned models on audio with different sampling rates or alternative representations. We note, however, that the intrinsic spectral information in the audio signal is invariant to the choice of the input representation or the sampling rate. Motivated by this, we introduce a novel way of processing audio signals by treating them as a collection of points in feature space, and we use point cloud machine learning models that give us invariance to the choice of representation parameters, such as DFT size or the sampling rate. Additionally, we observe that these methods result in smaller models, and allow us to significantly subsample the input representation with minimal effects to a trained model performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

自引率

0.00%

发文量