通过分析视频的交叉模态特征自动生成空间触觉效果

Proceedings of the 2020 ACM Symposium on Spatial User Interaction Pub Date : 2020-10-30 DOI:10.1145/3385959.3418459

Kai Zhang, Lawrence H. Kim, Yipeng Guo, Sean Follmer

{"title":"通过分析视频的交叉模态特征自动生成空间触觉效果","authors":"Kai Zhang, Lawrence H. Kim, Yipeng Guo, Sean Follmer","doi":"10.1145/3385959.3418459","DOIUrl":null,"url":null,"abstract":"Tactile effects can enhance user experience of multimedia content. However, generating appropriate tactile stimuli without any human intervention remains a challenge. While visual or audio information has been used to automatically generate tactile effects, utilizing cross-modal information may further improve the spatiotemporal synchronization and user experience of the tactile effects. In this paper, we present a pipeline for automatic generation of vibrotactile effects through the extraction of both the visual and audio features from a video. Two neural network models are used to extract the diegetic audio content, and localize a sounding object in the scene. These models are then used to determine the spatial distribution and the intensity of the tactile effects. To evaluate the performance of our method, we conducted a user study to compare the videos with tactile effects generated by our method to both the original videos without any tactile stimuli and videos with tactile effects generated based on visual features only. The study results demonstrate that our cross-modal method creates tactile effects with better spatiotemporal synchronization than the existing visual-based method and provides a more immersive user experience.","PeriodicalId":157249,"journal":{"name":"Proceedings of the 2020 ACM Symposium on Spatial User Interaction","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Automatic Generation of Spatial Tactile Effects by Analyzing Cross-modality Features of a Video\",\"authors\":\"Kai Zhang, Lawrence H. Kim, Yipeng Guo, Sean Follmer\",\"doi\":\"10.1145/3385959.3418459\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Tactile effects can enhance user experience of multimedia content. However, generating appropriate tactile stimuli without any human intervention remains a challenge. While visual or audio information has been used to automatically generate tactile effects, utilizing cross-modal information may further improve the spatiotemporal synchronization and user experience of the tactile effects. In this paper, we present a pipeline for automatic generation of vibrotactile effects through the extraction of both the visual and audio features from a video. Two neural network models are used to extract the diegetic audio content, and localize a sounding object in the scene. These models are then used to determine the spatial distribution and the intensity of the tactile effects. To evaluate the performance of our method, we conducted a user study to compare the videos with tactile effects generated by our method to both the original videos without any tactile stimuli and videos with tactile effects generated based on visual features only. The study results demonstrate that our cross-modal method creates tactile effects with better spatiotemporal synchronization than the existing visual-based method and provides a more immersive user experience.\",\"PeriodicalId\":157249,\"journal\":{\"name\":\"Proceedings of the 2020 ACM Symposium on Spatial User Interaction\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2020 ACM Symposium on Spatial User Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3385959.3418459\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 ACM Symposium on Spatial User Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3385959.3418459","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

触觉效果可以增强多媒体内容的用户体验。然而，在没有任何人为干预的情况下产生适当的触觉刺激仍然是一个挑战。虽然视觉或音频信息已经被用来自动产生触觉效果，但利用跨模态信息可以进一步提高触觉效果的时空同步性和用户体验。在本文中，我们提出了一种通过从视频中提取视觉和音频特征来自动生成振动触觉效果的管道。使用两个神经网络模型提取叙事音频内容，并在场景中定位发声对象。然后使用这些模型来确定空间分布和触觉效果的强度。为了评估我们的方法的性能，我们进行了一项用户研究，将我们的方法生成的具有触觉效果的视频与没有任何触觉刺激的原始视频和仅基于视觉特征生成触觉效果的视频进行比较。研究结果表明，与现有的基于视觉的方法相比，我们的跨模态方法产生的触觉效果具有更好的时空同步性，并提供了更身临其境的用户体验。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automatic Generation of Spatial Tactile Effects by Analyzing Cross-modality Features of a Video

Tactile effects can enhance user experience of multimedia content. However, generating appropriate tactile stimuli without any human intervention remains a challenge. While visual or audio information has been used to automatically generate tactile effects, utilizing cross-modal information may further improve the spatiotemporal synchronization and user experience of the tactile effects. In this paper, we present a pipeline for automatic generation of vibrotactile effects through the extraction of both the visual and audio features from a video. Two neural network models are used to extract the diegetic audio content, and localize a sounding object in the scene. These models are then used to determine the spatial distribution and the intensity of the tactile effects. To evaluate the performance of our method, we conducted a user study to compare the videos with tactile effects generated by our method to both the original videos without any tactile stimuli and videos with tactile effects generated based on visual features only. The study results demonstrate that our cross-modal method creates tactile effects with better spatiotemporal synchronization than the existing visual-based method and provides a more immersive user experience.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2020 ACM Symposium on Spatial User Interaction

自引率

0.00%

发文量