用于探测无人机图像中的目标和关键部分的多元知识感知与融合网络

IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Hanyu Wang, Qiang Shen, Zilong Deng
{"title":"用于探测无人机图像中的目标和关键部分的多元知识感知与融合网络","authors":"Hanyu Wang,&nbsp;Qiang Shen,&nbsp;Zilong Deng","doi":"10.1016/j.neucom.2024.128748","DOIUrl":null,"url":null,"abstract":"<div><div>Detecting targets and their key parts in UAV images is crucial for both military and civilian applications, including optimizing damage assessment, evaluating infrastructure, and facilitating disaster response efforts. Traditional top-down approaches impose excessive constraints that struggle to address challenges such as variable definitions and quantities of key parts, potential target occlusion, and model redundancy. Conversely, end-to-end approaches often overlook the relationships between targets and key parts, resulting in low detection accuracy. Inspired by the remarkable human reasoning process, we propose the Diverse Knowledge Perception and Fusion (DKPF) network, which skillfully balances the trade-offs between stringent constraints and unconstrained methods while ensuring both detection precision and real-time performance. Specifically, our model integrates reasoning guided by three distinct forms of knowledge: contextual knowledge at the image level in an unsupervised manner; explicit semantic knowledge regarding the interactions between targets and key parts at the instance level; and implicit comprehensive knowledge about the relationships among different types of targets or key parts, such as shape similarity. These specific knowledge forms are extracted through a novel adaptive fusion strategy for multi-scale features, a binary region-to-region semantic knowledge graph, and a data-driven self-attention architecture, respectively. Experiments conducted on both simulated and real-world datasets reveal that our method significantly outperforms state-of-the-art techniques, regardless of the number of key parts in the target. Furthermore, extensive ablation studies and visualization analyses validate both the efficacy of our approach and the interpretability of the generated features.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5000,"publicationDate":"2024-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Diverse Knowledge Perception and Fusion network for detecting targets and key parts in UAV images\",\"authors\":\"Hanyu Wang,&nbsp;Qiang Shen,&nbsp;Zilong Deng\",\"doi\":\"10.1016/j.neucom.2024.128748\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Detecting targets and their key parts in UAV images is crucial for both military and civilian applications, including optimizing damage assessment, evaluating infrastructure, and facilitating disaster response efforts. Traditional top-down approaches impose excessive constraints that struggle to address challenges such as variable definitions and quantities of key parts, potential target occlusion, and model redundancy. Conversely, end-to-end approaches often overlook the relationships between targets and key parts, resulting in low detection accuracy. Inspired by the remarkable human reasoning process, we propose the Diverse Knowledge Perception and Fusion (DKPF) network, which skillfully balances the trade-offs between stringent constraints and unconstrained methods while ensuring both detection precision and real-time performance. Specifically, our model integrates reasoning guided by three distinct forms of knowledge: contextual knowledge at the image level in an unsupervised manner; explicit semantic knowledge regarding the interactions between targets and key parts at the instance level; and implicit comprehensive knowledge about the relationships among different types of targets or key parts, such as shape similarity. These specific knowledge forms are extracted through a novel adaptive fusion strategy for multi-scale features, a binary region-to-region semantic knowledge graph, and a data-driven self-attention architecture, respectively. Experiments conducted on both simulated and real-world datasets reveal that our method significantly outperforms state-of-the-art techniques, regardless of the number of key parts in the target. Furthermore, extensive ablation studies and visualization analyses validate both the efficacy of our approach and the interpretability of the generated features.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2024-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231224015194\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224015194","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

检测无人机图像中的目标及其关键部分对于军事和民用应用都至关重要,包括优化损害评估、评估基础设施和促进灾难响应工作。传统的自上而下方法施加了过多的限制,难以应对关键部分的定义和数量可变、潜在的目标遮挡和模型冗余等挑战。相反,端到端方法往往会忽略目标与关键部件之间的关系,从而导致检测精度低下。受人类非凡推理过程的启发,我们提出了多样化知识感知和融合(DKPF)网络,它巧妙地平衡了严格约束和无约束方法之间的权衡,同时确保了检测精度和实时性。具体来说,我们的模型整合了三种不同形式的知识指导下的推理:以无监督方式在图像层面上的上下文知识;在实例层面上关于目标和关键部分之间相互作用的显式语义知识;以及关于不同类型目标或关键部分之间关系(如形状相似性)的隐式综合知识。这些特定的知识形式分别是通过一种新颖的多尺度特征自适应融合策略、二元区域到区域语义知识图谱和数据驱动的自我关注架构提取的。在模拟和真实世界数据集上进行的实验表明,无论目标中关键部分的数量如何,我们的方法都明显优于最先进的技术。此外,广泛的消融研究和可视化分析验证了我们方法的有效性和生成特征的可解释性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Diverse Knowledge Perception and Fusion network for detecting targets and key parts in UAV images
Detecting targets and their key parts in UAV images is crucial for both military and civilian applications, including optimizing damage assessment, evaluating infrastructure, and facilitating disaster response efforts. Traditional top-down approaches impose excessive constraints that struggle to address challenges such as variable definitions and quantities of key parts, potential target occlusion, and model redundancy. Conversely, end-to-end approaches often overlook the relationships between targets and key parts, resulting in low detection accuracy. Inspired by the remarkable human reasoning process, we propose the Diverse Knowledge Perception and Fusion (DKPF) network, which skillfully balances the trade-offs between stringent constraints and unconstrained methods while ensuring both detection precision and real-time performance. Specifically, our model integrates reasoning guided by three distinct forms of knowledge: contextual knowledge at the image level in an unsupervised manner; explicit semantic knowledge regarding the interactions between targets and key parts at the instance level; and implicit comprehensive knowledge about the relationships among different types of targets or key parts, such as shape similarity. These specific knowledge forms are extracted through a novel adaptive fusion strategy for multi-scale features, a binary region-to-region semantic knowledge graph, and a data-driven self-attention architecture, respectively. Experiments conducted on both simulated and real-world datasets reveal that our method significantly outperforms state-of-the-art techniques, regardless of the number of key parts in the target. Furthermore, extensive ablation studies and visualization analyses validate both the efficacy of our approach and the interpretability of the generated features.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信