Neural Style Transfer Based Voice Mimicking for Personalized Audio Stories

Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery Pub Date : 2020-10-12 DOI:10.1145/3422839.3423063

Syeda Maryam Fatima, Marina Shehzad, Syed Sami Murtuza, S. S. Raza

{"title":"Neural Style Transfer Based Voice Mimicking for Personalized Audio Stories","authors":"Syeda Maryam Fatima, Marina Shehzad, Syed Sami Murtuza, S. S. Raza","doi":"10.1145/3422839.3423063","DOIUrl":null,"url":null,"abstract":"This paper demonstrates a CNN based neural style transfer on audio dataset to make storytelling a personalized experience by asking users to record a few sentences that are used to mimic their voice. User audios are converted to spectrograms, the style of which is transferred to the spectrogram of a base voice narrating the story. This neural style transfer is similar to the style transfer on images. This approach stands out as it needs a small dataset and therefore, also takes less time to train the model. This project is intended specifically for children who prefer digital interaction and are also increasingly leaving behind the storytelling culture and for working parents who are not able to spend enough time with their children. By using a parent's initial recording to narrate a given story, it is designed to serve as a conjunction between storytelling and screen-time to incorporate children's interest through the implicit ethical themes of the stories, connecting children to their loved ones simultaneously ensuring an innocuous and meaningful learning experience.","PeriodicalId":270338,"journal":{"name":"Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3422839.3423063","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

This paper demonstrates a CNN based neural style transfer on audio dataset to make storytelling a personalized experience by asking users to record a few sentences that are used to mimic their voice. User audios are converted to spectrograms, the style of which is transferred to the spectrogram of a base voice narrating the story. This neural style transfer is similar to the style transfer on images. This approach stands out as it needs a small dataset and therefore, also takes less time to train the model. This project is intended specifically for children who prefer digital interaction and are also increasingly leaving behind the storytelling culture and for working parents who are not able to spend enough time with their children. By using a parent's initial recording to narrate a given story, it is designed to serve as a conjunction between storytelling and screen-time to incorporate children's interest through the implicit ethical themes of the stories, connecting children to their loved ones simultaneously ensuring an innocuous and meaningful learning experience.

查看原文本刊更多论文

基于神经风格转移的个性化音频故事语音模仿

本文在音频数据集上展示了一种基于CNN的神经风格转移，通过要求用户记录一些句子来模仿他们的声音，使讲故事成为一种个性化的体验。用户音频被转换成声谱图，声谱图的风格被转换成叙述故事的基本声音的声谱图。这种神经风格迁移类似于图像的风格迁移。这种方法脱颖而出，因为它需要一个小的数据集，因此也需要更少的时间来训练模型。这个项目是专门为那些喜欢数字互动的孩子们设计的，他们也越来越多地离开了讲故事的文化，以及那些不能花足够的时间和孩子在一起的工作父母。通过使用家长的原始录音来讲述一个给定的故事，它的设计是将讲故事和屏幕时间结合起来，通过故事中隐含的道德主题来结合孩子的兴趣，将孩子与他们所爱的人联系起来，同时确保无害和有意义的学习体验。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery

自引率

0.00%

发文量