{"title":"Syllable sequence of /a/+/ta/ can be heard as /atta/ in Japanese with visual or tactile cues","authors":"T. Arai, Miho Yamada, Megumi Okusawa","doi":"10.21437/interspeech.2022-10099","DOIUrl":null,"url":null,"abstract":"In our previous work, we reported that the word /atta/ with a geminate consonant differs from the syllable sequence /a/+pause+/ta/ in Japanese; specifically, there are formant transitions at the end of the first syllable in /atta/ but not in /a/+pause+/ta/. We also showed that native Japanese speakers perceived /atta/ when a facial video of /atta/ was synchronously played with an audio signal of /a/+pause+/ta/. In that study, we utilized two video clips for the two utterances in which the speaker was asked to control only the timing of the articulatory closing. In that case, there was no guarantee that the videos would be the exactly same except for the timing. Therefore, in the current study, we use a physical model of the human vocal tract with a miniature robot hand unit to achieve articulatory movements for visual cues. We also provide tactile cues to the listener’s finger because we want to test whether cues of another modality affect this perception in the same framework. Our findings showed that when either visual or tactile cues were presented with an audio stimulus, listeners more frequently responded that they heard /atta/ compared to audio-only presentations.","PeriodicalId":73500,"journal":{"name":"Interspeech","volume":"302 ","pages":"3083-3087"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interspeech","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/interspeech.2022-10099","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In our previous work, we reported that the word /atta/ with a geminate consonant differs from the syllable sequence /a/+pause+/ta/ in Japanese; specifically, there are formant transitions at the end of the first syllable in /atta/ but not in /a/+pause+/ta/. We also showed that native Japanese speakers perceived /atta/ when a facial video of /atta/ was synchronously played with an audio signal of /a/+pause+/ta/. In that study, we utilized two video clips for the two utterances in which the speaker was asked to control only the timing of the articulatory closing. In that case, there was no guarantee that the videos would be the exactly same except for the timing. Therefore, in the current study, we use a physical model of the human vocal tract with a miniature robot hand unit to achieve articulatory movements for visual cues. We also provide tactile cues to the listener’s finger because we want to test whether cues of another modality affect this perception in the same framework. Our findings showed that when either visual or tactile cues were presented with an audio stimulus, listeners more frequently responded that they heard /atta/ compared to audio-only presentations.