SiamNet:基于Siamese CNN的相似度模型,用于对抗生成的环境声音

Aswathy Madhu, S. Kumaraswamy
{"title":"SiamNet:基于Siamese CNN的相似度模型,用于对抗生成的环境声音","authors":"Aswathy Madhu, S. Kumaraswamy","doi":"10.1109/mlsp52302.2021.9596435","DOIUrl":null,"url":null,"abstract":"Recently, Generative Adversarial Networks (GANs) are being used extensively in machine learning applications for synthetic generation of image and audio samples. However efficient methods for the evaluation of the quality of GAN generated samples are not available yet. Moreover, most of the existing evaluation metrics are developed exclusively for images which may not work well with other types of data such as audio. Evaluation metrics developed specifically for audio are rare and hence the generation of perceptually acceptable audio using GAN is difficult. In this work, we address this problem. We propose Siamese CNN which simultaneously learns feature representation and similarity measure to evaluate the quality of synthetic audio generated by GAN. The proposed method estimates the perceptual proximity between the original and generated samples. Our similarity model is trained on two standard datasets of environmental sounds. The pre-trained model is evaluated on the environmental sounds generated using GAN. The predicted mean similarity score of the SiamNet are highly correlated with human ratings at the class level. This indicates that our model successfully captures the perceptual similarity between the generated and original audio samples.","PeriodicalId":156116,"journal":{"name":"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SiamNet: Siamese CNN Based Similarity Model for Adversarially Generated Environmental Sounds\",\"authors\":\"Aswathy Madhu, S. Kumaraswamy\",\"doi\":\"10.1109/mlsp52302.2021.9596435\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, Generative Adversarial Networks (GANs) are being used extensively in machine learning applications for synthetic generation of image and audio samples. However efficient methods for the evaluation of the quality of GAN generated samples are not available yet. Moreover, most of the existing evaluation metrics are developed exclusively for images which may not work well with other types of data such as audio. Evaluation metrics developed specifically for audio are rare and hence the generation of perceptually acceptable audio using GAN is difficult. In this work, we address this problem. We propose Siamese CNN which simultaneously learns feature representation and similarity measure to evaluate the quality of synthetic audio generated by GAN. The proposed method estimates the perceptual proximity between the original and generated samples. Our similarity model is trained on two standard datasets of environmental sounds. The pre-trained model is evaluated on the environmental sounds generated using GAN. The predicted mean similarity score of the SiamNet are highly correlated with human ratings at the class level. This indicates that our model successfully captures the perceptual similarity between the generated and original audio samples.\",\"PeriodicalId\":156116,\"journal\":{\"name\":\"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/mlsp52302.2021.9596435\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/mlsp52302.2021.9596435","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

最近,生成对抗网络(GANs)在机器学习应用中被广泛用于图像和音频样本的合成生成。然而,目前还没有有效的方法来评估GAN生成的样品的质量。此外,大多数现有的评估指标都是专门为图像开发的,这些图像可能不适用于其他类型的数据(如音频)。专门为音频开发的评估指标很少,因此使用GAN生成感知上可接受的音频是困难的。在这项工作中,我们解决了这个问题。我们提出了同时学习特征表示和相似度量的Siamese CNN来评估GAN生成的合成音频的质量。该方法估计原始样本和生成样本之间的感知接近度。我们的相似度模型是在两个标准的环境声音数据集上训练的。预训练模型在GAN生成的环境声音上进行评估。SiamNet预测的平均相似度得分与人类在类水平上的评分高度相关。这表明我们的模型成功地捕获了生成的和原始音频样本之间的感知相似性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
SiamNet: Siamese CNN Based Similarity Model for Adversarially Generated Environmental Sounds
Recently, Generative Adversarial Networks (GANs) are being used extensively in machine learning applications for synthetic generation of image and audio samples. However efficient methods for the evaluation of the quality of GAN generated samples are not available yet. Moreover, most of the existing evaluation metrics are developed exclusively for images which may not work well with other types of data such as audio. Evaluation metrics developed specifically for audio are rare and hence the generation of perceptually acceptable audio using GAN is difficult. In this work, we address this problem. We propose Siamese CNN which simultaneously learns feature representation and similarity measure to evaluate the quality of synthetic audio generated by GAN. The proposed method estimates the perceptual proximity between the original and generated samples. Our similarity model is trained on two standard datasets of environmental sounds. The pre-trained model is evaluated on the environmental sounds generated using GAN. The predicted mean similarity score of the SiamNet are highly correlated with human ratings at the class level. This indicates that our model successfully captures the perceptual similarity between the generated and original audio samples.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信