通过分析视频的交叉模态特征自动生成空间触觉效果

Kai Zhang, Lawrence H. Kim, Yipeng Guo, Sean Follmer
{"title":"通过分析视频的交叉模态特征自动生成空间触觉效果","authors":"Kai Zhang, Lawrence H. Kim, Yipeng Guo, Sean Follmer","doi":"10.1145/3385959.3418459","DOIUrl":null,"url":null,"abstract":"Tactile effects can enhance user experience of multimedia content. However, generating appropriate tactile stimuli without any human intervention remains a challenge. While visual or audio information has been used to automatically generate tactile effects, utilizing cross-modal information may further improve the spatiotemporal synchronization and user experience of the tactile effects. In this paper, we present a pipeline for automatic generation of vibrotactile effects through the extraction of both the visual and audio features from a video. Two neural network models are used to extract the diegetic audio content, and localize a sounding object in the scene. These models are then used to determine the spatial distribution and the intensity of the tactile effects. To evaluate the performance of our method, we conducted a user study to compare the videos with tactile effects generated by our method to both the original videos without any tactile stimuli and videos with tactile effects generated based on visual features only. The study results demonstrate that our cross-modal method creates tactile effects with better spatiotemporal synchronization than the existing visual-based method and provides a more immersive user experience.","PeriodicalId":157249,"journal":{"name":"Proceedings of the 2020 ACM Symposium on Spatial User Interaction","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Automatic Generation of Spatial Tactile Effects by Analyzing Cross-modality Features of a Video\",\"authors\":\"Kai Zhang, Lawrence H. Kim, Yipeng Guo, Sean Follmer\",\"doi\":\"10.1145/3385959.3418459\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Tactile effects can enhance user experience of multimedia content. However, generating appropriate tactile stimuli without any human intervention remains a challenge. While visual or audio information has been used to automatically generate tactile effects, utilizing cross-modal information may further improve the spatiotemporal synchronization and user experience of the tactile effects. In this paper, we present a pipeline for automatic generation of vibrotactile effects through the extraction of both the visual and audio features from a video. Two neural network models are used to extract the diegetic audio content, and localize a sounding object in the scene. These models are then used to determine the spatial distribution and the intensity of the tactile effects. To evaluate the performance of our method, we conducted a user study to compare the videos with tactile effects generated by our method to both the original videos without any tactile stimuli and videos with tactile effects generated based on visual features only. The study results demonstrate that our cross-modal method creates tactile effects with better spatiotemporal synchronization than the existing visual-based method and provides a more immersive user experience.\",\"PeriodicalId\":157249,\"journal\":{\"name\":\"Proceedings of the 2020 ACM Symposium on Spatial User Interaction\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2020 ACM Symposium on Spatial User Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3385959.3418459\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 ACM Symposium on Spatial User Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3385959.3418459","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

触觉效果可以增强多媒体内容的用户体验。然而,在没有任何人为干预的情况下产生适当的触觉刺激仍然是一个挑战。虽然视觉或音频信息已经被用来自动产生触觉效果,但利用跨模态信息可以进一步提高触觉效果的时空同步性和用户体验。在本文中,我们提出了一种通过从视频中提取视觉和音频特征来自动生成振动触觉效果的管道。使用两个神经网络模型提取叙事音频内容,并在场景中定位发声对象。然后使用这些模型来确定空间分布和触觉效果的强度。为了评估我们的方法的性能,我们进行了一项用户研究,将我们的方法生成的具有触觉效果的视频与没有任何触觉刺激的原始视频和仅基于视觉特征生成触觉效果的视频进行比较。研究结果表明,与现有的基于视觉的方法相比,我们的跨模态方法产生的触觉效果具有更好的时空同步性,并提供了更身临其境的用户体验。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Automatic Generation of Spatial Tactile Effects by Analyzing Cross-modality Features of a Video
Tactile effects can enhance user experience of multimedia content. However, generating appropriate tactile stimuli without any human intervention remains a challenge. While visual or audio information has been used to automatically generate tactile effects, utilizing cross-modal information may further improve the spatiotemporal synchronization and user experience of the tactile effects. In this paper, we present a pipeline for automatic generation of vibrotactile effects through the extraction of both the visual and audio features from a video. Two neural network models are used to extract the diegetic audio content, and localize a sounding object in the scene. These models are then used to determine the spatial distribution and the intensity of the tactile effects. To evaluate the performance of our method, we conducted a user study to compare the videos with tactile effects generated by our method to both the original videos without any tactile stimuli and videos with tactile effects generated based on visual features only. The study results demonstrate that our cross-modal method creates tactile effects with better spatiotemporal synchronization than the existing visual-based method and provides a more immersive user experience.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信