Multi-modal translation system and its evaluation

S. Morishima, Satoshi Nakamura
{"title":"Multi-modal translation system and its evaluation","authors":"S. Morishima, Satoshi Nakamura","doi":"10.1109/ICMI.2002.1167000","DOIUrl":null,"url":null,"abstract":"Speech-to-speech translation has been studied to realize natural human communication beyond language barriers. Toward further multi-modal natural communication, visual information such as face and lip movements will be necessary. We introduce a multi-modal English-to-Japanese and Japanese-to-English translation system that also translates the speaker's speech motion while synchronizing it to the translated speech. To retain the speaker's facial expression, we substitute only the speech organ's image with the synthesized one, which is made by a three-dimensional wire-frame model that is adaptable to any speaker. Our approach enables image synthesis and translation with an extremely small database. We conduct subjective evaluation tests using the connected digit discrimination test using data with and without audio-visual lip-synchronization. The results confirm the significant quality of the proposed audio-visual translation system and the importance of lip-synchronization.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"119 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMI.2002.1167000","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Speech-to-speech translation has been studied to realize natural human communication beyond language barriers. Toward further multi-modal natural communication, visual information such as face and lip movements will be necessary. We introduce a multi-modal English-to-Japanese and Japanese-to-English translation system that also translates the speaker's speech motion while synchronizing it to the translated speech. To retain the speaker's facial expression, we substitute only the speech organ's image with the synthesized one, which is made by a three-dimensional wire-frame model that is adaptable to any speaker. Our approach enables image synthesis and translation with an extremely small database. We conduct subjective evaluation tests using the connected digit discrimination test using data with and without audio-visual lip-synchronization. The results confirm the significant quality of the proposed audio-visual translation system and the importance of lip-synchronization.
多模态翻译系统及其评价
语音到语音的翻译是实现人类超越语言障碍的自然交流的研究方向。为了进一步实现多模态自然交流,面部和嘴唇运动等视觉信息将是必要的。我们引入了一种多模态的英语到日语和日语到英语的翻译系统,该系统在翻译说话人的语音动作的同时,将其与被翻译的语音同步。为了保留说话人的面部表情,我们只将说话器官的图像替换为合成的图像,合成的图像是由三维线框模型合成的,该模型适用于任何说话人。我们的方法使图像合成和翻译与一个极小的数据库。我们使用连接数字识别测试进行主观评价测试,使用有和没有视听嘴唇同步的数据。结果证实了所提出的视听翻译系统的显著质量和唇部同步的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信