Zhi-Peng Li , Wen-Jian Liu , Xin Sun , Yi-Jie Pan , Valeriya Gribova , Vladimir Fedorovich Filaretov , Anthony G. Cohn , De-Shuang Huang
{"title":"GPPT:图形金字塔池变压器的视觉场景","authors":"Zhi-Peng Li , Wen-Jian Liu , Xin Sun , Yi-Jie Pan , Valeriya Gribova , Vladimir Fedorovich Filaretov , Anthony G. Cohn , De-Shuang Huang","doi":"10.1016/j.neucom.2025.130729","DOIUrl":null,"url":null,"abstract":"<div><div>In the field of computer vision, network architectures are critical to the performance of tasks. Vision Graph Neural Network (ViG) has shown remarkable results in handling various vision tasks with their unique characteristics. However, the lack of multi-scale information in ViG limits its expressive capability. To address this challenge, we propose a Graph Pyramid Pooling Transformer (GPPT), which aims to enhance the performance of the model by introducing multi-scale feature learning. The core advantage of GPPT is its ability to effectively capture and fuse feature information at different scales. Specifically, it first generates multi-level pooled graphs using a graph pyramid pooling structure. Next, it encodes features at each scale using a weight-shared Graph Convolutional Neural Network (GCN). Then, it enhances information exchange across scales through a cross-scale feature fusion mechanism. Finally, it captures long-range node dependencies using a transformer module. The experimental results demonstrate that GPPT achieves exceptional performance across various visual scenes, including image classification, and object detection, highlighting its generality and validity.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"647 ","pages":"Article 130729"},"PeriodicalIF":5.5000,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GPPT: Graph pyramid pooling transformer for visual scene\",\"authors\":\"Zhi-Peng Li , Wen-Jian Liu , Xin Sun , Yi-Jie Pan , Valeriya Gribova , Vladimir Fedorovich Filaretov , Anthony G. Cohn , De-Shuang Huang\",\"doi\":\"10.1016/j.neucom.2025.130729\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In the field of computer vision, network architectures are critical to the performance of tasks. Vision Graph Neural Network (ViG) has shown remarkable results in handling various vision tasks with their unique characteristics. However, the lack of multi-scale information in ViG limits its expressive capability. To address this challenge, we propose a Graph Pyramid Pooling Transformer (GPPT), which aims to enhance the performance of the model by introducing multi-scale feature learning. The core advantage of GPPT is its ability to effectively capture and fuse feature information at different scales. Specifically, it first generates multi-level pooled graphs using a graph pyramid pooling structure. Next, it encodes features at each scale using a weight-shared Graph Convolutional Neural Network (GCN). Then, it enhances information exchange across scales through a cross-scale feature fusion mechanism. Finally, it captures long-range node dependencies using a transformer module. The experimental results demonstrate that GPPT achieves exceptional performance across various visual scenes, including image classification, and object detection, highlighting its generality and validity.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"647 \",\"pages\":\"Article 130729\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225014018\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225014018","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
GPPT: Graph pyramid pooling transformer for visual scene
In the field of computer vision, network architectures are critical to the performance of tasks. Vision Graph Neural Network (ViG) has shown remarkable results in handling various vision tasks with their unique characteristics. However, the lack of multi-scale information in ViG limits its expressive capability. To address this challenge, we propose a Graph Pyramid Pooling Transformer (GPPT), which aims to enhance the performance of the model by introducing multi-scale feature learning. The core advantage of GPPT is its ability to effectively capture and fuse feature information at different scales. Specifically, it first generates multi-level pooled graphs using a graph pyramid pooling structure. Next, it encodes features at each scale using a weight-shared Graph Convolutional Neural Network (GCN). Then, it enhances information exchange across scales through a cross-scale feature fusion mechanism. Finally, it captures long-range node dependencies using a transformer module. The experimental results demonstrate that GPPT achieves exceptional performance across various visual scenes, including image classification, and object detection, highlighting its generality and validity.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.