{"title":"Transformer-Based Object Detection with Deep Feature Fusion Using Carafe Operator in Remote Sensing Image","authors":"Shenao Chen, Bingqi Wang, Chaoliang Zhong","doi":"10.4108/ew.3404","DOIUrl":null,"url":null,"abstract":"Recently, broad applications can be found in optical remote sensing images (ORSI), such as in urban planning, military mapping, field survey, and so on. Target detection is one of its important applications. In the past few years, with the wings of deep learning, the target detection algorithm based on CNN has harvested a breakthrough. However, due to the different directions and target sizes in ORSI, it will lead to poor performance if the target detection algorithm for ordinary optical images is directly applied. Therefore, how to improve the performance of the object detection model on ORSI is thorny. Aiming at solving the above problems, premised on the one-stage target detection model-RetinaNet, this paper proposes a new network structure with more efficiency and accuracy, that is, a Transformer-Based Network with Deep Feature Fusion Using Carafe Operator (TRCNet). Firstly, a PVT2 structure based on the transformer is adopted in the backbone and we apply a multi-head attention mechanism to obtain global information in optical images with complex backgrounds. Meanwhile, the depth is increased to better extract features. Secondly, we introduce the carafe operator into the FPN structure of the neck to integrate the high-level semantics with the low-level ones more efficiently to further improve its target detection performance. Experiments on our well-known public NWPU-VHR-10 and RSOD show that mAP increases by 8.4% and 1.7% respectively. Comparison with other advanced networks also witnesses that our proposed network is effective and advanced.","PeriodicalId":53458,"journal":{"name":"EAI Endorsed Transactions on Energy Web","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"EAI Endorsed Transactions on Energy Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4108/ew.3404","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 0
Abstract
Recently, broad applications can be found in optical remote sensing images (ORSI), such as in urban planning, military mapping, field survey, and so on. Target detection is one of its important applications. In the past few years, with the wings of deep learning, the target detection algorithm based on CNN has harvested a breakthrough. However, due to the different directions and target sizes in ORSI, it will lead to poor performance if the target detection algorithm for ordinary optical images is directly applied. Therefore, how to improve the performance of the object detection model on ORSI is thorny. Aiming at solving the above problems, premised on the one-stage target detection model-RetinaNet, this paper proposes a new network structure with more efficiency and accuracy, that is, a Transformer-Based Network with Deep Feature Fusion Using Carafe Operator (TRCNet). Firstly, a PVT2 structure based on the transformer is adopted in the backbone and we apply a multi-head attention mechanism to obtain global information in optical images with complex backgrounds. Meanwhile, the depth is increased to better extract features. Secondly, we introduce the carafe operator into the FPN structure of the neck to integrate the high-level semantics with the low-level ones more efficiently to further improve its target detection performance. Experiments on our well-known public NWPU-VHR-10 and RSOD show that mAP increases by 8.4% and 1.7% respectively. Comparison with other advanced networks also witnesses that our proposed network is effective and advanced.
期刊介绍:
With ICT pervading everyday objects and infrastructures, the ‘Future Internet’ is envisioned to undergo a radical transformation from how we know it today (a mere communication highway) into a vast hybrid network seamlessly integrating knowledge, people and machines into techno-social ecosystems whose behaviour transcends the boundaries of today’s engineering science. As the internet of things continues to grow, billions and trillions of data bytes need to be moved, stored and shared. The energy thus consumed and the climate impact of data centers are increasing dramatically, thereby becoming significant contributors to global warming and climate change. As reported recently, the combined electricity consumption of the world’s data centers has already exceeded that of some of the world''s top ten economies. In the ensuing process of integrating traditional and renewable energy, monitoring and managing various energy sources, and processing and transferring technological information through various channels, IT will undoubtedly play an ever-increasing and central role. Several technologies are currently racing to production to meet this challenge, from ‘smart dust’ to hybrid networks capable of controlling the emergence of dependable and reliable green and energy-efficient ecosystems – which we generically term the ‘energy web’ – calling for major paradigm shifts highly disruptive of the ways the energy sector functions today. The EAI Transactions on Energy Web are positioned at the forefront of these efforts and provide a forum for the most forward-looking, state-of-the-art research bringing together the cross section of IT and Energy communities. The journal will publish original works reporting on prominent advances that challenge traditional thinking.