{"title":"VoiceFind","authors":"Irtaza Shahid, Y. Bai, Nakul Garg, Nirupam Roy","doi":"10.1145/3539490.3539600","DOIUrl":null,"url":null,"abstract":"Robust speech enhancement is a key requirement for many emerging applications. It is challenging to recover clear speech in commodity devices, especially in noisy real-world scenarios. In this paper, we propose VoiceFind, which uses only two microphones to spatial filter the desired speech from all interference. Furthermore, to improve the intelligibility of the speech after filtering, we design a Conditional Generative Adversarial Network (cGAN) to reconstruct the desired speech from environmental noises and interference speeches. This is an early attempt to explore this direction. Results from simulation and real-world experiments show promise.","PeriodicalId":377149,"journal":{"name":"Proceedings of the 1st ACM International Workshop on Intelligent Acoustic Systems and Applications","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st ACM International Workshop on Intelligent Acoustic Systems and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3539490.3539600","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Robust speech enhancement is a key requirement for many emerging applications. It is challenging to recover clear speech in commodity devices, especially in noisy real-world scenarios. In this paper, we propose VoiceFind, which uses only two microphones to spatial filter the desired speech from all interference. Furthermore, to improve the intelligibility of the speech after filtering, we design a Conditional Generative Adversarial Network (cGAN) to reconstruct the desired speech from environmental noises and interference speeches. This is an early attempt to explore this direction. Results from simulation and real-world experiments show promise.