{"title":"VTuckeR:用于场景图生成的多模态塔克融合技术","authors":"Geng Jia, Yunhui Yi, Wentao Kan","doi":"10.1145/3514105.3514111","DOIUrl":null,"url":null,"abstract":"Originated from object segmentation and word vector representations, Scene Graph Generation (SGG) became a complex task built on enumerous research results. Today's scene graph generation (SGG) task is still far from practical, we believe that it's due to unfair matching strategies between images and object regions. Inspired by Tucker decomposition's success in VQA area, in this paper, we propose VTuckeR, a relatively straightforward but powerful linear model based on Tucker decomposition of the tensor representation of images and object features. In VTuckeR, we control the complexity of the merging scheme while keeping itself good interpretability. We show our model is able to enforce multiple types of Scene Graph Generation models in Visual Genome dataset in the PredCls mode. What's more, a more accurate scene graph may aid prediction of wireless channel dynamics in the future, which is called V2C.","PeriodicalId":360718,"journal":{"name":"Proceedings of the 2022 9th International Conference on Wireless Communication and Sensor Networks","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VTuckeR: Multimodal Tucker Fusion for Scene Graph Generation\",\"authors\":\"Geng Jia, Yunhui Yi, Wentao Kan\",\"doi\":\"10.1145/3514105.3514111\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Originated from object segmentation and word vector representations, Scene Graph Generation (SGG) became a complex task built on enumerous research results. Today's scene graph generation (SGG) task is still far from practical, we believe that it's due to unfair matching strategies between images and object regions. Inspired by Tucker decomposition's success in VQA area, in this paper, we propose VTuckeR, a relatively straightforward but powerful linear model based on Tucker decomposition of the tensor representation of images and object features. In VTuckeR, we control the complexity of the merging scheme while keeping itself good interpretability. We show our model is able to enforce multiple types of Scene Graph Generation models in Visual Genome dataset in the PredCls mode. What's more, a more accurate scene graph may aid prediction of wireless channel dynamics in the future, which is called V2C.\",\"PeriodicalId\":360718,\"journal\":{\"name\":\"Proceedings of the 2022 9th International Conference on Wireless Communication and Sensor Networks\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 9th International Conference on Wireless Communication and Sensor Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3514105.3514111\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 9th International Conference on Wireless Communication and Sensor Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3514105.3514111","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
VTuckeR: Multimodal Tucker Fusion for Scene Graph Generation
Originated from object segmentation and word vector representations, Scene Graph Generation (SGG) became a complex task built on enumerous research results. Today's scene graph generation (SGG) task is still far from practical, we believe that it's due to unfair matching strategies between images and object regions. Inspired by Tucker decomposition's success in VQA area, in this paper, we propose VTuckeR, a relatively straightforward but powerful linear model based on Tucker decomposition of the tensor representation of images and object features. In VTuckeR, we control the complexity of the merging scheme while keeping itself good interpretability. We show our model is able to enforce multiple types of Scene Graph Generation models in Visual Genome dataset in the PredCls mode. What's more, a more accurate scene graph may aid prediction of wireless channel dynamics in the future, which is called V2C.