{"title":"面向多维触觉渲染的时间力触觉数据跨模态生成算法","authors":"Rui Song;Guohong Liu;Yan Zhang;Xiaoying Sun","doi":"10.1109/TMM.2025.3590907","DOIUrl":null,"url":null,"abstract":"Exploiting the correlation between multimodal data to generate tactile data has become a preferred approach to enhance tactile rendering fidelity. Nevertheless, existing studies have often overlooked the temporal dynamics of force tactile data. To fill this gap in the literature, this paper introduces a joint visual-audio approach to generate a temporal tactile data (VA2T) algorithm, focusing on the temporal and long-term dependencies of force tactile data. VA2T uses a feature extraction network to extract audio and image features and then uses an attention mechanism and decoder to fuse these features. The tactile reconstructor generates temporal friction and a normal force, with dilated causal convolution securing the temporal dependencies in the force tactile data. Simulation experiments on the LMT dataset demonstrate that compared with the transformer and audio-visual-aided haptic signal reconstruction (AVHR) algorithms, the VA2T algorithm reduces the RMSE for generated friction by 29.44% and 32.37%, respectively, and for normal forces by 23.30% and 35.43%, respectively. In addition, we developed a haptic rendering approach that combines electrovibration and mechanical vibration to render the generated friction and normal force. The subjective experimental results showed that the rendering fidelity of the data generated using the VA2T method was significantly higher than that of the data generated using the transformer and AVHR methods.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"5092-5102"},"PeriodicalIF":9.7000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Cross-Modal Generation Algorithm for Temporal Force Tactile Data for Multidimensional Haptic Rendering\",\"authors\":\"Rui Song;Guohong Liu;Yan Zhang;Xiaoying Sun\",\"doi\":\"10.1109/TMM.2025.3590907\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Exploiting the correlation between multimodal data to generate tactile data has become a preferred approach to enhance tactile rendering fidelity. Nevertheless, existing studies have often overlooked the temporal dynamics of force tactile data. To fill this gap in the literature, this paper introduces a joint visual-audio approach to generate a temporal tactile data (VA2T) algorithm, focusing on the temporal and long-term dependencies of force tactile data. VA2T uses a feature extraction network to extract audio and image features and then uses an attention mechanism and decoder to fuse these features. The tactile reconstructor generates temporal friction and a normal force, with dilated causal convolution securing the temporal dependencies in the force tactile data. Simulation experiments on the LMT dataset demonstrate that compared with the transformer and audio-visual-aided haptic signal reconstruction (AVHR) algorithms, the VA2T algorithm reduces the RMSE for generated friction by 29.44% and 32.37%, respectively, and for normal forces by 23.30% and 35.43%, respectively. In addition, we developed a haptic rendering approach that combines electrovibration and mechanical vibration to render the generated friction and normal force. The subjective experimental results showed that the rendering fidelity of the data generated using the VA2T method was significantly higher than that of the data generated using the transformer and AVHR methods.\",\"PeriodicalId\":13273,\"journal\":{\"name\":\"IEEE Transactions on Multimedia\",\"volume\":\"27 \",\"pages\":\"5092-5102\"},\"PeriodicalIF\":9.7000,\"publicationDate\":\"2025-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multimedia\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11086402/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11086402/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
A Cross-Modal Generation Algorithm for Temporal Force Tactile Data for Multidimensional Haptic Rendering
Exploiting the correlation between multimodal data to generate tactile data has become a preferred approach to enhance tactile rendering fidelity. Nevertheless, existing studies have often overlooked the temporal dynamics of force tactile data. To fill this gap in the literature, this paper introduces a joint visual-audio approach to generate a temporal tactile data (VA2T) algorithm, focusing on the temporal and long-term dependencies of force tactile data. VA2T uses a feature extraction network to extract audio and image features and then uses an attention mechanism and decoder to fuse these features. The tactile reconstructor generates temporal friction and a normal force, with dilated causal convolution securing the temporal dependencies in the force tactile data. Simulation experiments on the LMT dataset demonstrate that compared with the transformer and audio-visual-aided haptic signal reconstruction (AVHR) algorithms, the VA2T algorithm reduces the RMSE for generated friction by 29.44% and 32.37%, respectively, and for normal forces by 23.30% and 35.43%, respectively. In addition, we developed a haptic rendering approach that combines electrovibration and mechanical vibration to render the generated friction and normal force. The subjective experimental results showed that the rendering fidelity of the data generated using the VA2T method was significantly higher than that of the data generated using the transformer and AVHR methods.
期刊介绍:
The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.