{"title":"基于容噪拓扑核的超高清航拍图像分类系统","authors":"Luming Zhang;Guifeng Wang;Ming Chen;Ling Shao","doi":"10.1109/TNNLS.2024.3355928","DOIUrl":null,"url":null,"abstract":"With thousands of observation satellites orbiting the Earth, massive-scale ultrahigh-definition (UHD) images are captured daily, covering vast areas of land, often extending across millions of square kilometers. These images commonly feature a wide range of ground objects, such as vehicles and rooftops, numbering from tens to hundreds. The ability to categorize the diverse types of objects in UHD aerial photographs is essential for a variety of real-world applications, including intelligent transportation systems, disaster prediction, and precision agriculture. In this study, we introduce a novel framework for categorizing UHD aerial photographs. The core of our approach is to represent the spatial configurations of ground objects topologically and encode these layouts using a binary matrix factorization (MF) technique that robustly addresses the challenge of noisy image-level labels. Specifically, for each UHD aerial photograph, we identify visually and semantically important object patches. These patches are then connected spatially to form graphlets, small graphs that capture the layout and relations between adjacent objects. To enhance the understanding of these graphlets, we propose a binary MF approach that captures their semantic content. The method integrates four key components: 1) learning binary hash codes; 2) refining noisy labels; 3) incorporating deep image-level semantics; and 4) adaptively updating the data graph. The binary MF is solved iteratively, with each graphlet being transformed into a set of discrete hash codes. These hash codes, which represent the spatial and semantic information of the graphlets, are subsequently encoded into a feature vector using a kernel machine, enabling multilabel categorization of the aerial photographs. For validation, we compiled a large-scale dataset of UHD aerial photographs, sourced from 100 of the top-ranked cities worldwide. Experimental results demonstrate that: 1) our method excels in learning categorization models from imperfect labels and 2) the integration of the four proposed attributes enables effective encoding of the graphlets into hash codes, providing a powerful representation of the UHD aerial photographs.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"36 5","pages":"9699-9708"},"PeriodicalIF":10.2000,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A UHD Aerial Photograph Categorization System by Learning a Noise-Tolerant Topology Kernel\",\"authors\":\"Luming Zhang;Guifeng Wang;Ming Chen;Ling Shao\",\"doi\":\"10.1109/TNNLS.2024.3355928\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With thousands of observation satellites orbiting the Earth, massive-scale ultrahigh-definition (UHD) images are captured daily, covering vast areas of land, often extending across millions of square kilometers. These images commonly feature a wide range of ground objects, such as vehicles and rooftops, numbering from tens to hundreds. The ability to categorize the diverse types of objects in UHD aerial photographs is essential for a variety of real-world applications, including intelligent transportation systems, disaster prediction, and precision agriculture. In this study, we introduce a novel framework for categorizing UHD aerial photographs. The core of our approach is to represent the spatial configurations of ground objects topologically and encode these layouts using a binary matrix factorization (MF) technique that robustly addresses the challenge of noisy image-level labels. Specifically, for each UHD aerial photograph, we identify visually and semantically important object patches. These patches are then connected spatially to form graphlets, small graphs that capture the layout and relations between adjacent objects. To enhance the understanding of these graphlets, we propose a binary MF approach that captures their semantic content. The method integrates four key components: 1) learning binary hash codes; 2) refining noisy labels; 3) incorporating deep image-level semantics; and 4) adaptively updating the data graph. The binary MF is solved iteratively, with each graphlet being transformed into a set of discrete hash codes. These hash codes, which represent the spatial and semantic information of the graphlets, are subsequently encoded into a feature vector using a kernel machine, enabling multilabel categorization of the aerial photographs. For validation, we compiled a large-scale dataset of UHD aerial photographs, sourced from 100 of the top-ranked cities worldwide. Experimental results demonstrate that: 1) our method excels in learning categorization models from imperfect labels and 2) the integration of the four proposed attributes enables effective encoding of the graphlets into hash codes, providing a powerful representation of the UHD aerial photographs.\",\"PeriodicalId\":13303,\"journal\":{\"name\":\"IEEE transactions on neural networks and learning systems\",\"volume\":\"36 5\",\"pages\":\"9699-9708\"},\"PeriodicalIF\":10.2000,\"publicationDate\":\"2025-03-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on neural networks and learning systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10938689/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10938689/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A UHD Aerial Photograph Categorization System by Learning a Noise-Tolerant Topology Kernel
With thousands of observation satellites orbiting the Earth, massive-scale ultrahigh-definition (UHD) images are captured daily, covering vast areas of land, often extending across millions of square kilometers. These images commonly feature a wide range of ground objects, such as vehicles and rooftops, numbering from tens to hundreds. The ability to categorize the diverse types of objects in UHD aerial photographs is essential for a variety of real-world applications, including intelligent transportation systems, disaster prediction, and precision agriculture. In this study, we introduce a novel framework for categorizing UHD aerial photographs. The core of our approach is to represent the spatial configurations of ground objects topologically and encode these layouts using a binary matrix factorization (MF) technique that robustly addresses the challenge of noisy image-level labels. Specifically, for each UHD aerial photograph, we identify visually and semantically important object patches. These patches are then connected spatially to form graphlets, small graphs that capture the layout and relations between adjacent objects. To enhance the understanding of these graphlets, we propose a binary MF approach that captures their semantic content. The method integrates four key components: 1) learning binary hash codes; 2) refining noisy labels; 3) incorporating deep image-level semantics; and 4) adaptively updating the data graph. The binary MF is solved iteratively, with each graphlet being transformed into a set of discrete hash codes. These hash codes, which represent the spatial and semantic information of the graphlets, are subsequently encoded into a feature vector using a kernel machine, enabling multilabel categorization of the aerial photographs. For validation, we compiled a large-scale dataset of UHD aerial photographs, sourced from 100 of the top-ranked cities worldwide. Experimental results demonstrate that: 1) our method excels in learning categorization models from imperfect labels and 2) the integration of the four proposed attributes enables effective encoding of the graphlets into hash codes, providing a powerful representation of the UHD aerial photographs.
期刊介绍:
The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.