{"title":"Visual Relationship Detection With A Deep Convolutional Relationship Network","authors":"Yao Peng, D. Chen, Lanfen Lin","doi":"10.1109/ICIP40778.2020.9190642","DOIUrl":null,"url":null,"abstract":"Visual relationship is crucial to image understanding and can be applied to many tasks (e.g., image caption and visual question answering). Despite great progress on many vision tasks, relationship detection remains a challenging problem due to the complexity of modeling the widely spread and imbalanced distribution of {subject – predicate – object} triplets. In this paper, we propose a new framework to capture the relative positions and sizes of the subject and object in the feature map and add a new branch to filter out some object pairs that are unlikely to have relationships. In addition, an activation function is trained to increase the probability of some feature maps given an object pair. Experiments on two large datasets, the Visual Relationship Detection (VRD) and Visual Genome (VG) datasets, demonstrate the superiority of our new approach over state-of-the-art methods. Further, ablation study verifies the effectiveness of our techniques.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Image Processing (ICIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIP40778.2020.9190642","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Visual relationship is crucial to image understanding and can be applied to many tasks (e.g., image caption and visual question answering). Despite great progress on many vision tasks, relationship detection remains a challenging problem due to the complexity of modeling the widely spread and imbalanced distribution of {subject – predicate – object} triplets. In this paper, we propose a new framework to capture the relative positions and sizes of the subject and object in the feature map and add a new branch to filter out some object pairs that are unlikely to have relationships. In addition, an activation function is trained to increase the probability of some feature maps given an object pair. Experiments on two large datasets, the Visual Relationship Detection (VRD) and Visual Genome (VG) datasets, demonstrate the superiority of our new approach over state-of-the-art methods. Further, ablation study verifies the effectiveness of our techniques.