Laughter synthesis: A comparison between Variational autoencoder and Autoencoder

Nadia Mansouri, Z. Lachiri
{"title":"Laughter synthesis: A comparison between Variational autoencoder and Autoencoder","authors":"Nadia Mansouri, Z. Lachiri","doi":"10.1109/ATSIP49331.2020.9231607","DOIUrl":null,"url":null,"abstract":"Laughter is one of the most famous non verbal sounds that human produce since birth, it conveys messages about our emotional state. These characteristics make it an important sound that should be studied in order to improve the human-machine interactions. In this paper we investigate the audio laughter generation process from its acoustic features. This suggested process is considered as an analysis-transformation synthesis benchmark based on unsupervised dimensionality reduction techniques: The standard autoencoder (AE) and the variational autoencoder (VAE). Therefore, the laughter synthesis methodology consists of transforming the extracted high-dimensional log magnitude spectrogram into a low-dimensional latent vector. This latent vector contains the most valuable information used to reconstruct a synthetic magnitude spectrogram that will be passed through a specific vocoder to generate the laughter waveform. We systematically, exploit the VAE to create new sound (speech-laugh) based on the interpolation process. To evaluate the performance of these models two evaluation metrics were conducted: objective and subjective evaluations.","PeriodicalId":384018,"journal":{"name":"2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ATSIP49331.2020.9231607","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Laughter is one of the most famous non verbal sounds that human produce since birth, it conveys messages about our emotional state. These characteristics make it an important sound that should be studied in order to improve the human-machine interactions. In this paper we investigate the audio laughter generation process from its acoustic features. This suggested process is considered as an analysis-transformation synthesis benchmark based on unsupervised dimensionality reduction techniques: The standard autoencoder (AE) and the variational autoencoder (VAE). Therefore, the laughter synthesis methodology consists of transforming the extracted high-dimensional log magnitude spectrogram into a low-dimensional latent vector. This latent vector contains the most valuable information used to reconstruct a synthetic magnitude spectrogram that will be passed through a specific vocoder to generate the laughter waveform. We systematically, exploit the VAE to create new sound (speech-laugh) based on the interpolation process. To evaluate the performance of these models two evaluation metrics were conducted: objective and subjective evaluations.
笑声合成:变分自编码器与自编码器的比较
笑声是人类自出生以来发出的最著名的非语言声音之一,它传达了我们情绪状态的信息。这些特点使它成为一个重要的声音,应该研究,以提高人机交互。本文从声笑声的声学特征出发,研究了声笑声的产生过程。该过程被认为是基于无监督降维技术的分析-转换综合基准:标准自编码器(AE)和变分自编码器(VAE)。因此,笑声合成方法包括将提取的高维对数幅度谱图转换为低维潜在向量。这个潜在向量包含最有价值的信息,用于重建一个合成的幅度谱图,该谱图将通过一个特定的声码器来生成笑声波形。我们系统地利用VAE在插值过程的基础上创造新的声音(言语-笑声)。为了评价这些模型的性能,采用了客观评价和主观评价两种评价指标。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信