{"title":"一种有效的基于表示的无偏场景图生成网络","authors":"Wenxing Ma, Tianxiang Hou, Qianji Di, Zhongang Qi, Ying Shan, Hanzi Wang","doi":"10.1109/ICASSP49357.2023.10094727","DOIUrl":null,"url":null,"abstract":"The scene graph generation (SGG) task has attracted increasing attention in recent years. The goal of SGG is to predict relations between pairs of objects within an image. Due to the long-tailed distribution of the dataset annotations, the performance of SGG is still far from satisfactory. To address the long-tailed problem, existing methods try various ways to conduct unbiased learning. However, we argue that the essence of the long-tailed problem in SGG is that the classifier is seriously affected by the long-tailed data. To handle this issue, we propose a novel network named ERBNet, which contains a relation feature fusion (RFF) encoder to construct effective representations of relations between objects, and a nearest class mean (NCM) classifier to conduct relation prediction based on relation feature similarities. Extensive experimental results show that the proposed ERBNet outperforms several state-of-the-art methods on the challenging Visual Genome dataset.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ERBNet: An Effective Representation Based Network for Unbiased Scene Graph Generation\",\"authors\":\"Wenxing Ma, Tianxiang Hou, Qianji Di, Zhongang Qi, Ying Shan, Hanzi Wang\",\"doi\":\"10.1109/ICASSP49357.2023.10094727\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The scene graph generation (SGG) task has attracted increasing attention in recent years. The goal of SGG is to predict relations between pairs of objects within an image. Due to the long-tailed distribution of the dataset annotations, the performance of SGG is still far from satisfactory. To address the long-tailed problem, existing methods try various ways to conduct unbiased learning. However, we argue that the essence of the long-tailed problem in SGG is that the classifier is seriously affected by the long-tailed data. To handle this issue, we propose a novel network named ERBNet, which contains a relation feature fusion (RFF) encoder to construct effective representations of relations between objects, and a nearest class mean (NCM) classifier to conduct relation prediction based on relation feature similarities. Extensive experimental results show that the proposed ERBNet outperforms several state-of-the-art methods on the challenging Visual Genome dataset.\",\"PeriodicalId\":113072,\"journal\":{\"name\":\"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP49357.2023.10094727\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP49357.2023.10094727","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
ERBNet: An Effective Representation Based Network for Unbiased Scene Graph Generation
The scene graph generation (SGG) task has attracted increasing attention in recent years. The goal of SGG is to predict relations between pairs of objects within an image. Due to the long-tailed distribution of the dataset annotations, the performance of SGG is still far from satisfactory. To address the long-tailed problem, existing methods try various ways to conduct unbiased learning. However, we argue that the essence of the long-tailed problem in SGG is that the classifier is seriously affected by the long-tailed data. To handle this issue, we propose a novel network named ERBNet, which contains a relation feature fusion (RFF) encoder to construct effective representations of relations between objects, and a nearest class mean (NCM) classifier to conduct relation prediction based on relation feature similarities. Extensive experimental results show that the proposed ERBNet outperforms several state-of-the-art methods on the challenging Visual Genome dataset.