K. Nakadai, HIroshi G. Okuno, H. Nakajima, Yuji Hasegawa, H. Tsujino
{"title":"An open source software system for robot audition HARK and its evaluation","authors":"K. Nakadai, HIroshi G. Okuno, H. Nakajima, Yuji Hasegawa, H. Tsujino","doi":"10.1109/ICHR.2008.4756031","DOIUrl":null,"url":null,"abstract":"Robot capability of listening to several things at once by its own ears, that is, robot audition, is important in improving human-robot interaction. The critical issue in robot audition is real-time processing in noisy environments with high flexibility to support various kinds of robots and hardware configurations. This paper presents open-source robot audition software, called ldquoHARKrdquo, which includes sound source localization, separation, and automatic speech recognition (ASR). Since separated sounds suffer from spectral distortion due to separation, HARK generates a temporal-frequency map of reliability, called ldquomissing feature maskrdquo, for features of separated sounds. Then separated sounds are recognized by the Missing-Feature Theory (MFT) based ASR with missing feature masks. HARK is implemented on the middleware called ldquoFlowDesignerrdquo to share intermediate audio data, which provides real-time processing. HARKpsilas performance in recognition of noisy/simultaneous speech is shown by using three humanoid robots, Honda ASIMO, SIG2 and Robovie with different microphone layouts.","PeriodicalId":402020,"journal":{"name":"Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"72","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICHR.2008.4756031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 72
Abstract
Robot capability of listening to several things at once by its own ears, that is, robot audition, is important in improving human-robot interaction. The critical issue in robot audition is real-time processing in noisy environments with high flexibility to support various kinds of robots and hardware configurations. This paper presents open-source robot audition software, called ldquoHARKrdquo, which includes sound source localization, separation, and automatic speech recognition (ASR). Since separated sounds suffer from spectral distortion due to separation, HARK generates a temporal-frequency map of reliability, called ldquomissing feature maskrdquo, for features of separated sounds. Then separated sounds are recognized by the Missing-Feature Theory (MFT) based ASR with missing feature masks. HARK is implemented on the middleware called ldquoFlowDesignerrdquo to share intermediate audio data, which provides real-time processing. HARKpsilas performance in recognition of noisy/simultaneous speech is shown by using three humanoid robots, Honda ASIMO, SIG2 and Robovie with different microphone layouts.