An open source software system for robot audition HARK and its evaluation

Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots Pub Date : 2008-12-01 DOI:10.1109/ICHR.2008.4756031

K. Nakadai, HIroshi G. Okuno, H. Nakajima, Yuji Hasegawa, H. Tsujino

{"title":"An open source software system for robot audition HARK and its evaluation","authors":"K. Nakadai, HIroshi G. Okuno, H. Nakajima, Yuji Hasegawa, H. Tsujino","doi":"10.1109/ICHR.2008.4756031","DOIUrl":null,"url":null,"abstract":"Robot capability of listening to several things at once by its own ears, that is, robot audition, is important in improving human-robot interaction. The critical issue in robot audition is real-time processing in noisy environments with high flexibility to support various kinds of robots and hardware configurations. This paper presents open-source robot audition software, called ldquoHARKrdquo, which includes sound source localization, separation, and automatic speech recognition (ASR). Since separated sounds suffer from spectral distortion due to separation, HARK generates a temporal-frequency map of reliability, called ldquomissing feature maskrdquo, for features of separated sounds. Then separated sounds are recognized by the Missing-Feature Theory (MFT) based ASR with missing feature masks. HARK is implemented on the middleware called ldquoFlowDesignerrdquo to share intermediate audio data, which provides real-time processing. HARKpsilas performance in recognition of noisy/simultaneous speech is shown by using three humanoid robots, Honda ASIMO, SIG2 and Robovie with different microphone layouts.","PeriodicalId":402020,"journal":{"name":"Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"72","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICHR.2008.4756031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 72

Abstract

Robot capability of listening to several things at once by its own ears, that is, robot audition, is important in improving human-robot interaction. The critical issue in robot audition is real-time processing in noisy environments with high flexibility to support various kinds of robots and hardware configurations. This paper presents open-source robot audition software, called ldquoHARKrdquo, which includes sound source localization, separation, and automatic speech recognition (ASR). Since separated sounds suffer from spectral distortion due to separation, HARK generates a temporal-frequency map of reliability, called ldquomissing feature maskrdquo, for features of separated sounds. Then separated sounds are recognized by the Missing-Feature Theory (MFT) based ASR with missing feature masks. HARK is implemented on the middleware called ldquoFlowDesignerrdquo to share intermediate audio data, which provides real-time processing. HARKpsilas performance in recognition of noisy/simultaneous speech is shown by using three humanoid robots, Honda ASIMO, SIG2 and Robovie with different microphone layouts.

查看原文本刊更多论文

机器人试听系统HARK的开源软件系统及其评价

机器人用自己的耳朵同时听到几件事情的能力，即机器人听音能力，对提高人机交互具有重要意义。机器人听力的关键问题是在嘈杂环境下的实时处理，并具有高度的灵活性，以支持各种机器人和硬件配置。本文介绍了一种名为ldquoHARKrdquo的开源机器人试听软件，该软件包括声源定位、分离和自动语音识别(ASR)。由于分离后的声音会因分离而产生频谱失真，因此HARK会为分离后的声音特征生成一个时间-频率可靠性图，称为ldquomissing feature maskdquo。然后利用缺失特征掩模，利用缺失特征理论(MFT)对分离的声音进行识别。HARK在名为ldquoFlowDesignerrdquo的中间件上实现，以共享中间音频数据，从而提供实时处理。采用本田ASIMO、SIG2和Robovie三种具有不同麦克风布局的人形机器人，展示了HARKpsilas对噪声/同步语音的识别性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots

自引率

0.00%

发文量