R. Sheelvant, Bidisha Sharma, Maulik C. Madhavi, Rohan Kumar Das, S. Prasanna, Haizhou Li
{"title":"RSL2019: A Realistic Speech Localization Corpus","authors":"R. Sheelvant, Bidisha Sharma, Maulik C. Madhavi, Rohan Kumar Das, S. Prasanna, Haizhou Li","doi":"10.1109/O-COCOSDA46868.2019.9060842","DOIUrl":null,"url":null,"abstract":"In this work, we present the development of a new database for speech localization that we refer to as Realistic Speech Localization 2019 (RSL2019) corpus. The corpus is designed for the study of sound source localization in real-world applications. The RSL2019 corpus is a continuing effort, which presently contains 22.60 hours of speech data, recorded using a four channel microphone array, and played over a loudspeaker from different directions of arrival (DOA). We consider 180 speech utterances spoken by 6 speakers, selected from RSR2015 database, which are played over the loudspeaker positioned at different angles and distances from the microphone array. We vary the DOA from 0 to 360 degree angle at an interval of 5 degree, at 1 metre and 1.5 metre distance. From each position and DOA, we also record white noise to study the robustness, and time stretched pulse to generate the transfer function for speech localization algorithm. Furthermore, we present the experimental results and analysis on state-of-the-art sound source localization algorithm using the open source HARK toolkit on the created RSL2019 database. This database will be provided for research purpose upon request to the authors.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/O-COCOSDA46868.2019.9060842","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
In this work, we present the development of a new database for speech localization that we refer to as Realistic Speech Localization 2019 (RSL2019) corpus. The corpus is designed for the study of sound source localization in real-world applications. The RSL2019 corpus is a continuing effort, which presently contains 22.60 hours of speech data, recorded using a four channel microphone array, and played over a loudspeaker from different directions of arrival (DOA). We consider 180 speech utterances spoken by 6 speakers, selected from RSR2015 database, which are played over the loudspeaker positioned at different angles and distances from the microphone array. We vary the DOA from 0 to 360 degree angle at an interval of 5 degree, at 1 metre and 1.5 metre distance. From each position and DOA, we also record white noise to study the robustness, and time stretched pulse to generate the transfer function for speech localization algorithm. Furthermore, we present the experimental results and analysis on state-of-the-art sound source localization algorithm using the open source HARK toolkit on the created RSL2019 database. This database will be provided for research purpose upon request to the authors.