{"title":"Custom Voice Cloner","authors":"Usharani K, Nandha kumaran H, Nikhilesh Pranav M.S, Nithish kumar K.K, Prasanna Krishna A.S","doi":"10.59256/ijire.20240501002","DOIUrl":null,"url":null,"abstract":"The Custom Voice Cloner is based on voice signal speech synthesizer. It is a technology that converts text into audible speech, simulating human speech characteristics like pitch and tone. It finds applications in virtual assistants, navigation systems, and accessibility tools. Building one in Python typically involves Text-to-Speech (TTS) libraries such as gTTS, pyttsx3, or platform-specific options for Windows and macOS, offering easy text-to-speech conversion.However, TTS libraries might lack customization and voice quality needed for advanced projects. For more sophisticated applications, custom voice synthesizers can be built using deep learning techniques like Tacotron and WaveNet. These models learn speech nuances for more natural output.Creating a custom voice synthesizer is challenging, requiring high-quality training data, machine learning expertise, and substantial computational resources. It goes beyond generating speech to convey emotions and nuances in pronunciation for natural and expressive voices. Key Word: Voice signal speech synthesizer,text-to-speech conversion, deep learning,TTS, gTTS, pyttsx3,etc.","PeriodicalId":516932,"journal":{"name":"International Journal of Innovative Research in Engineering","volume":"427 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Innovative Research in Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.59256/ijire.20240501002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The Custom Voice Cloner is based on voice signal speech synthesizer. It is a technology that converts text into audible speech, simulating human speech characteristics like pitch and tone. It finds applications in virtual assistants, navigation systems, and accessibility tools. Building one in Python typically involves Text-to-Speech (TTS) libraries such as gTTS, pyttsx3, or platform-specific options for Windows and macOS, offering easy text-to-speech conversion.However, TTS libraries might lack customization and voice quality needed for advanced projects. For more sophisticated applications, custom voice synthesizers can be built using deep learning techniques like Tacotron and WaveNet. These models learn speech nuances for more natural output.Creating a custom voice synthesizer is challenging, requiring high-quality training data, machine learning expertise, and substantial computational resources. It goes beyond generating speech to convey emotions and nuances in pronunciation for natural and expressive voices. Key Word: Voice signal speech synthesizer,text-to-speech conversion, deep learning,TTS, gTTS, pyttsx3,etc.