{"title":"对话中基于视频的呼吸波形估计:人机交互的新任务和数据集","authors":"Takao Obi, Kotaro Funakoshi","doi":"10.1145/3577190.3614154","DOIUrl":null,"url":null,"abstract":"Respiration is closely related to speech, so respiratory information is useful for improving human-machine multimodal spoken interaction from various perspectives. A machine-learning task is presented for multimodal interactive systems to improve the compatibility of the systems and promote smooth interaction with them. This “video-based respiration waveform estimation (VRWE)” task consists of two subtasks: waveform amplitude estimation and waveform gradient estimation. A dataset consisting of respiratory data for 30 participants was created for this task, and a strong baseline method based on 3DCNN-ConvLSTM was evaluated on the dataset. Finally, VRWE, especially gradient estimation, was shown to be effective in predicting user voice activity after 200 ms. These results suggest that VRWE is effective for improving human-machine multimodal interaction.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Video-based Respiratory Waveform Estimation in Dialogue: A Novel Task and Dataset for Human-Machine Interaction\",\"authors\":\"Takao Obi, Kotaro Funakoshi\",\"doi\":\"10.1145/3577190.3614154\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Respiration is closely related to speech, so respiratory information is useful for improving human-machine multimodal spoken interaction from various perspectives. A machine-learning task is presented for multimodal interactive systems to improve the compatibility of the systems and promote smooth interaction with them. This “video-based respiration waveform estimation (VRWE)” task consists of two subtasks: waveform amplitude estimation and waveform gradient estimation. A dataset consisting of respiratory data for 30 participants was created for this task, and a strong baseline method based on 3DCNN-ConvLSTM was evaluated on the dataset. Finally, VRWE, especially gradient estimation, was shown to be effective in predicting user voice activity after 200 ms. These results suggest that VRWE is effective for improving human-machine multimodal interaction.\",\"PeriodicalId\":93171,\"journal\":{\"name\":\"Companion Publication of the 2020 International Conference on Multimodal Interaction\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Companion Publication of the 2020 International Conference on Multimodal Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3577190.3614154\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion Publication of the 2020 International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3577190.3614154","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Video-based Respiratory Waveform Estimation in Dialogue: A Novel Task and Dataset for Human-Machine Interaction
Respiration is closely related to speech, so respiratory information is useful for improving human-machine multimodal spoken interaction from various perspectives. A machine-learning task is presented for multimodal interactive systems to improve the compatibility of the systems and promote smooth interaction with them. This “video-based respiration waveform estimation (VRWE)” task consists of two subtasks: waveform amplitude estimation and waveform gradient estimation. A dataset consisting of respiratory data for 30 participants was created for this task, and a strong baseline method based on 3DCNN-ConvLSTM was evaluated on the dataset. Finally, VRWE, especially gradient estimation, was shown to be effective in predicting user voice activity after 200 ms. These results suggest that VRWE is effective for improving human-machine multimodal interaction.