A UHD Aerial Photograph Categorization System by Learning a Noise-Tolerant Topology Kernel

IF 10.2 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Luming Zhang;Guifeng Wang;Ming Chen;Ling Shao
{"title":"A UHD Aerial Photograph Categorization System by Learning a Noise-Tolerant Topology Kernel","authors":"Luming Zhang;Guifeng Wang;Ming Chen;Ling Shao","doi":"10.1109/TNNLS.2024.3355928","DOIUrl":null,"url":null,"abstract":"With thousands of observation satellites orbiting the Earth, massive-scale ultrahigh-definition (UHD) images are captured daily, covering vast areas of land, often extending across millions of square kilometers. These images commonly feature a wide range of ground objects, such as vehicles and rooftops, numbering from tens to hundreds. The ability to categorize the diverse types of objects in UHD aerial photographs is essential for a variety of real-world applications, including intelligent transportation systems, disaster prediction, and precision agriculture. In this study, we introduce a novel framework for categorizing UHD aerial photographs. The core of our approach is to represent the spatial configurations of ground objects topologically and encode these layouts using a binary matrix factorization (MF) technique that robustly addresses the challenge of noisy image-level labels. Specifically, for each UHD aerial photograph, we identify visually and semantically important object patches. These patches are then connected spatially to form graphlets, small graphs that capture the layout and relations between adjacent objects. To enhance the understanding of these graphlets, we propose a binary MF approach that captures their semantic content. The method integrates four key components: 1) learning binary hash codes; 2) refining noisy labels; 3) incorporating deep image-level semantics; and 4) adaptively updating the data graph. The binary MF is solved iteratively, with each graphlet being transformed into a set of discrete hash codes. These hash codes, which represent the spatial and semantic information of the graphlets, are subsequently encoded into a feature vector using a kernel machine, enabling multilabel categorization of the aerial photographs. For validation, we compiled a large-scale dataset of UHD aerial photographs, sourced from 100 of the top-ranked cities worldwide. Experimental results demonstrate that: 1) our method excels in learning categorization models from imperfect labels and 2) the integration of the four proposed attributes enables effective encoding of the graphlets into hash codes, providing a powerful representation of the UHD aerial photographs.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"36 5","pages":"9699-9708"},"PeriodicalIF":10.2000,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10938689/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

With thousands of observation satellites orbiting the Earth, massive-scale ultrahigh-definition (UHD) images are captured daily, covering vast areas of land, often extending across millions of square kilometers. These images commonly feature a wide range of ground objects, such as vehicles and rooftops, numbering from tens to hundreds. The ability to categorize the diverse types of objects in UHD aerial photographs is essential for a variety of real-world applications, including intelligent transportation systems, disaster prediction, and precision agriculture. In this study, we introduce a novel framework for categorizing UHD aerial photographs. The core of our approach is to represent the spatial configurations of ground objects topologically and encode these layouts using a binary matrix factorization (MF) technique that robustly addresses the challenge of noisy image-level labels. Specifically, for each UHD aerial photograph, we identify visually and semantically important object patches. These patches are then connected spatially to form graphlets, small graphs that capture the layout and relations between adjacent objects. To enhance the understanding of these graphlets, we propose a binary MF approach that captures their semantic content. The method integrates four key components: 1) learning binary hash codes; 2) refining noisy labels; 3) incorporating deep image-level semantics; and 4) adaptively updating the data graph. The binary MF is solved iteratively, with each graphlet being transformed into a set of discrete hash codes. These hash codes, which represent the spatial and semantic information of the graphlets, are subsequently encoded into a feature vector using a kernel machine, enabling multilabel categorization of the aerial photographs. For validation, we compiled a large-scale dataset of UHD aerial photographs, sourced from 100 of the top-ranked cities worldwide. Experimental results demonstrate that: 1) our method excels in learning categorization models from imperfect labels and 2) the integration of the four proposed attributes enables effective encoding of the graphlets into hash codes, providing a powerful representation of the UHD aerial photographs.
基于容噪拓扑核的超高清航拍图像分类系统
成千上万的观测卫星环绕地球运行,每天都能捕捉到大规模的超高清(UHD)图像,覆盖了大片土地,通常跨越数百万平方公里。这些图像通常以广泛的地面物体为特征,如车辆和屋顶,数量从数十到数百。在超高清航空照片中对不同类型的物体进行分类的能力对于各种现实世界的应用至关重要,包括智能交通系统、灾害预测和精准农业。在这项研究中,我们引入了一个新的框架来分类超高清航空照片。该方法的核心是以拓扑方式表示地物的空间配置,并使用二进制矩阵分解(MF)技术对这些布局进行编码,该技术可以稳健地解决噪声图像级标签的挑战。具体来说,对于每个超高清航空照片,我们识别视觉和语义上重要的目标块。然后将这些小块在空间上连接起来形成小图形,小图形捕捉相邻对象之间的布局和关系。为了增强对这些石墨烯的理解,我们提出了一种捕获其语义内容的二元MF方法。该方法集成了四个关键部分:1)学习二进制哈希码;2)噪声标签的细化;3)融合深度图像级语义;4)自适应更新数据图。二进制MF是迭代求解的,每个graphlet被转换成一组离散的哈希码。这些哈希码表示石墨烯的空间和语义信息,随后使用内核机器将其编码为特征向量,从而实现航空照片的多标签分类。为了验证,我们编制了一个大规模的超高清航空照片数据集,这些照片来自全球100个排名靠前的城市。实验结果表明:1)我们的方法在从不完美标签中学习分类模型方面表现出色;2)将提出的四个属性集成在一起,可以有效地将graphlet编码成哈希码,从而提供了一个强大的超高清航空照片表示。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE transactions on neural networks and learning systems
IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
CiteScore
23.80
自引率
9.60%
发文量
2102
审稿时长
3-8 weeks
期刊介绍: The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信