{"title":"SRGTNet:用于细粒度图像检索的子区域导向变压器哈希网络","authors":"Hongchun Lu;Songlin He;Xue Li;Min Han;Chase Wu","doi":"10.1109/TBDATA.2025.3533916","DOIUrl":null,"url":null,"abstract":"Fine-grained image retrieval (FGIR) is a crucial task in computer vision, with broad applications in areas such as biodiversity monitoring, e-commerce, and medical diagnostics. However, capturing discriminative feature information to generate binary codes is difficult because of high intraclass variance and low interclass variance. To address this challenge, we (i) build a novel and highly reliable fine-grained deep hash learning framework for more accurate retrieval of fine-grained images. (ii) We propose a part significant region erasure method that forces the network to generate compact binary codes. (iii) We introduce a CNN-guided Transformer structure for use in fine-grained retrieval tasks to capture fine-grained images effectively in contextual feature relationships to mine more discriminative regional features. (iv) A multistage mixture loss is designed to optimize network training and enhance feature representation. Experiments were conducted on three publicly available fine-grained datasets. The results show that our method effectively improves the performance of fine-grained image retrieval.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2388-2400"},"PeriodicalIF":5.7000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SRGTNet: Subregion-Guided Transformer Hash Network for Fine-Grained Image Retrieval\",\"authors\":\"Hongchun Lu;Songlin He;Xue Li;Min Han;Chase Wu\",\"doi\":\"10.1109/TBDATA.2025.3533916\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Fine-grained image retrieval (FGIR) is a crucial task in computer vision, with broad applications in areas such as biodiversity monitoring, e-commerce, and medical diagnostics. However, capturing discriminative feature information to generate binary codes is difficult because of high intraclass variance and low interclass variance. To address this challenge, we (i) build a novel and highly reliable fine-grained deep hash learning framework for more accurate retrieval of fine-grained images. (ii) We propose a part significant region erasure method that forces the network to generate compact binary codes. (iii) We introduce a CNN-guided Transformer structure for use in fine-grained retrieval tasks to capture fine-grained images effectively in contextual feature relationships to mine more discriminative regional features. (iv) A multistage mixture loss is designed to optimize network training and enhance feature representation. Experiments were conducted on three publicly available fine-grained datasets. The results show that our method effectively improves the performance of fine-grained image retrieval.\",\"PeriodicalId\":13106,\"journal\":{\"name\":\"IEEE Transactions on Big Data\",\"volume\":\"11 5\",\"pages\":\"2388-2400\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2025-01-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Big Data\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10854876/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10854876/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
SRGTNet: Subregion-Guided Transformer Hash Network for Fine-Grained Image Retrieval
Fine-grained image retrieval (FGIR) is a crucial task in computer vision, with broad applications in areas such as biodiversity monitoring, e-commerce, and medical diagnostics. However, capturing discriminative feature information to generate binary codes is difficult because of high intraclass variance and low interclass variance. To address this challenge, we (i) build a novel and highly reliable fine-grained deep hash learning framework for more accurate retrieval of fine-grained images. (ii) We propose a part significant region erasure method that forces the network to generate compact binary codes. (iii) We introduce a CNN-guided Transformer structure for use in fine-grained retrieval tasks to capture fine-grained images effectively in contextual feature relationships to mine more discriminative regional features. (iv) A multistage mixture loss is designed to optimize network training and enhance feature representation. Experiments were conducted on three publicly available fine-grained datasets. The results show that our method effectively improves the performance of fine-grained image retrieval.
期刊介绍:
The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.