SiamNet:基于Siamese CNN的相似度模型，用于对抗生成的环境声音

2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP) Pub Date : 2021-10-25 DOI:10.1109/mlsp52302.2021.9596435

Aswathy Madhu, S. Kumaraswamy

{"title":"SiamNet:基于Siamese CNN的相似度模型，用于对抗生成的环境声音","authors":"Aswathy Madhu, S. Kumaraswamy","doi":"10.1109/mlsp52302.2021.9596435","DOIUrl":null,"url":null,"abstract":"Recently, Generative Adversarial Networks (GANs) are being used extensively in machine learning applications for synthetic generation of image and audio samples. However efficient methods for the evaluation of the quality of GAN generated samples are not available yet. Moreover, most of the existing evaluation metrics are developed exclusively for images which may not work well with other types of data such as audio. Evaluation metrics developed specifically for audio are rare and hence the generation of perceptually acceptable audio using GAN is difficult. In this work, we address this problem. We propose Siamese CNN which simultaneously learns feature representation and similarity measure to evaluate the quality of synthetic audio generated by GAN. The proposed method estimates the perceptual proximity between the original and generated samples. Our similarity model is trained on two standard datasets of environmental sounds. The pre-trained model is evaluated on the environmental sounds generated using GAN. The predicted mean similarity score of the SiamNet are highly correlated with human ratings at the class level. This indicates that our model successfully captures the perceptual similarity between the generated and original audio samples.","PeriodicalId":156116,"journal":{"name":"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SiamNet: Siamese CNN Based Similarity Model for Adversarially Generated Environmental Sounds\",\"authors\":\"Aswathy Madhu, S. Kumaraswamy\",\"doi\":\"10.1109/mlsp52302.2021.9596435\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, Generative Adversarial Networks (GANs) are being used extensively in machine learning applications for synthetic generation of image and audio samples. However efficient methods for the evaluation of the quality of GAN generated samples are not available yet. Moreover, most of the existing evaluation metrics are developed exclusively for images which may not work well with other types of data such as audio. Evaluation metrics developed specifically for audio are rare and hence the generation of perceptually acceptable audio using GAN is difficult. In this work, we address this problem. We propose Siamese CNN which simultaneously learns feature representation and similarity measure to evaluate the quality of synthetic audio generated by GAN. The proposed method estimates the perceptual proximity between the original and generated samples. Our similarity model is trained on two standard datasets of environmental sounds. The pre-trained model is evaluated on the environmental sounds generated using GAN. The predicted mean similarity score of the SiamNet are highly correlated with human ratings at the class level. This indicates that our model successfully captures the perceptual similarity between the generated and original audio samples.\",\"PeriodicalId\":156116,\"journal\":{\"name\":\"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/mlsp52302.2021.9596435\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/mlsp52302.2021.9596435","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

最近，生成对抗网络(GANs)在机器学习应用中被广泛用于图像和音频样本的合成生成。然而，目前还没有有效的方法来评估GAN生成的样品的质量。此外，大多数现有的评估指标都是专门为图像开发的，这些图像可能不适用于其他类型的数据(如音频)。专门为音频开发的评估指标很少，因此使用GAN生成感知上可接受的音频是困难的。在这项工作中，我们解决了这个问题。我们提出了同时学习特征表示和相似度量的Siamese CNN来评估GAN生成的合成音频的质量。该方法估计原始样本和生成样本之间的感知接近度。我们的相似度模型是在两个标准的环境声音数据集上训练的。预训练模型在GAN生成的环境声音上进行评估。SiamNet预测的平均相似度得分与人类在类水平上的评分高度相关。这表明我们的模型成功地捕获了生成的和原始音频样本之间的感知相似性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SiamNet: Siamese CNN Based Similarity Model for Adversarially Generated Environmental Sounds

Recently, Generative Adversarial Networks (GANs) are being used extensively in machine learning applications for synthetic generation of image and audio samples. However efficient methods for the evaluation of the quality of GAN generated samples are not available yet. Moreover, most of the existing evaluation metrics are developed exclusively for images which may not work well with other types of data such as audio. Evaluation metrics developed specifically for audio are rare and hence the generation of perceptually acceptable audio using GAN is difficult. In this work, we address this problem. We propose Siamese CNN which simultaneously learns feature representation and similarity measure to evaluate the quality of synthetic audio generated by GAN. The proposed method estimates the perceptual proximity between the original and generated samples. Our similarity model is trained on two standard datasets of environmental sounds. The pre-trained model is evaluated on the environmental sounds generated using GAN. The predicted mean similarity score of the SiamNet are highly correlated with human ratings at the class level. This indicates that our model successfully captures the perceptual similarity between the generated and original audio samples.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)

自引率

0.00%

发文量