{"title":"走向多特征重音翻译:基于语音和文本线索的人类重音产生和感知评价","authors":"Quoc Truong Do, S. Sakti, Satoshi Nakamura","doi":"10.1109/SLT.2018.8639641","DOIUrl":null,"url":null,"abstract":"Emphasis is an important factor of human speech that helps convey emotion and the focused information of utterances. Recently, studies have been conducted on speech-to-speech translation to preserve the emphasis information from the source language to the target language. However, since different cultures have various ways of expressing emphasis, just considering the acoustic-to-acoustic feature emphasis translation may not always reflect the experiences of users. On the other hand, emphasis can be expressed at various levels in both text and speech. However, it remains unclear how we communicate emphasis in a different form (acoustic/linguistic) with different levels and whether we can perceive the difference between different levels of emphasis or observe the similarity of the same emphasis levels in both text and speech forms. In this paper, we conducted analyses on human perception of emphasis with both speech and text clues through crowd-sourced evaluations. The results indicate that although participants can distinguish among emphasis levels and perceive the same emphasis level between speech and text, many ambiguities still exist at certain emphasis levels. Thus, our result provides insight into what needs to be handled during the emphasis translation process.","PeriodicalId":377307,"journal":{"name":"2018 IEEE Spoken Language Technology Workshop (SLT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Toward Multi-Features Emphasis Speech Translation: Assessment of Human Emphasis Production and Perception with Speech and Text Clues\",\"authors\":\"Quoc Truong Do, S. Sakti, Satoshi Nakamura\",\"doi\":\"10.1109/SLT.2018.8639641\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Emphasis is an important factor of human speech that helps convey emotion and the focused information of utterances. Recently, studies have been conducted on speech-to-speech translation to preserve the emphasis information from the source language to the target language. However, since different cultures have various ways of expressing emphasis, just considering the acoustic-to-acoustic feature emphasis translation may not always reflect the experiences of users. On the other hand, emphasis can be expressed at various levels in both text and speech. However, it remains unclear how we communicate emphasis in a different form (acoustic/linguistic) with different levels and whether we can perceive the difference between different levels of emphasis or observe the similarity of the same emphasis levels in both text and speech forms. In this paper, we conducted analyses on human perception of emphasis with both speech and text clues through crowd-sourced evaluations. The results indicate that although participants can distinguish among emphasis levels and perceive the same emphasis level between speech and text, many ambiguities still exist at certain emphasis levels. Thus, our result provides insight into what needs to be handled during the emphasis translation process.\",\"PeriodicalId\":377307,\"journal\":{\"name\":\"2018 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT.2018.8639641\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2018.8639641","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Toward Multi-Features Emphasis Speech Translation: Assessment of Human Emphasis Production and Perception with Speech and Text Clues
Emphasis is an important factor of human speech that helps convey emotion and the focused information of utterances. Recently, studies have been conducted on speech-to-speech translation to preserve the emphasis information from the source language to the target language. However, since different cultures have various ways of expressing emphasis, just considering the acoustic-to-acoustic feature emphasis translation may not always reflect the experiences of users. On the other hand, emphasis can be expressed at various levels in both text and speech. However, it remains unclear how we communicate emphasis in a different form (acoustic/linguistic) with different levels and whether we can perceive the difference between different levels of emphasis or observe the similarity of the same emphasis levels in both text and speech forms. In this paper, we conducted analyses on human perception of emphasis with both speech and text clues through crowd-sourced evaluations. The results indicate that although participants can distinguish among emphasis levels and perceive the same emphasis level between speech and text, many ambiguities still exist at certain emphasis levels. Thus, our result provides insight into what needs to be handled during the emphasis translation process.