用于地标检索的空间金字塔注意力增强视觉描述符

Q3 Computer Science
Luepol Pipanmekaporn, Suwatchai Kamonsantiroj, Chiabwoot Ratanavilisagul, Sathit Prasomphan
{"title":"用于地标检索的空间金字塔注意力增强视觉描述符","authors":"Luepol Pipanmekaporn, Suwatchai Kamonsantiroj, Chiabwoot Ratanavilisagul, Sathit Prasomphan","doi":"10.18178/joig.11.4.359-366","DOIUrl":null,"url":null,"abstract":"Landmark retrieval, which aims to search for landmark images similar to a query photo within a massive image database, has received considerable attention for many years. Despite this, finding landmarks quickly and accurately still presents some unique challenges. To tackle these challenges, we present a deep learning model, called the Spatial-Pyramid Attention network (SPA). This network is an end-to-end convolutional network, incorporating a spatial-pyramid attention layer that encodes the input image, leveraging the spatial pyramid structure to highlight regional features based on their relative spatial distinctiveness. An image descriptor is then generated by aggregating these regional features. According to our experiments on benchmark datasets including Oxford5k, Paris6k, and Landmark-100, our proposed model, SPA, achieves mean Average Precision (mAP) accuracy of 85.3% with the Oxford dataset, 89.6% with the Paris dataset, and 80.4% in the Landmark-100 dataset, outperforming existing state-of-theart deep image retrieval models.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Spatial Pyramid Attention Enhanced Visual Descriptors for Landmark Retrieval\",\"authors\":\"Luepol Pipanmekaporn, Suwatchai Kamonsantiroj, Chiabwoot Ratanavilisagul, Sathit Prasomphan\",\"doi\":\"10.18178/joig.11.4.359-366\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Landmark retrieval, which aims to search for landmark images similar to a query photo within a massive image database, has received considerable attention for many years. Despite this, finding landmarks quickly and accurately still presents some unique challenges. To tackle these challenges, we present a deep learning model, called the Spatial-Pyramid Attention network (SPA). This network is an end-to-end convolutional network, incorporating a spatial-pyramid attention layer that encodes the input image, leveraging the spatial pyramid structure to highlight regional features based on their relative spatial distinctiveness. An image descriptor is then generated by aggregating these regional features. According to our experiments on benchmark datasets including Oxford5k, Paris6k, and Landmark-100, our proposed model, SPA, achieves mean Average Precision (mAP) accuracy of 85.3% with the Oxford dataset, 89.6% with the Paris dataset, and 80.4% in the Landmark-100 dataset, outperforming existing state-of-theart deep image retrieval models.\",\"PeriodicalId\":36336,\"journal\":{\"name\":\"中国图象图形学报\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"中国图象图形学报\",\"FirstCategoryId\":\"1093\",\"ListUrlMain\":\"https://doi.org/10.18178/joig.11.4.359-366\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"中国图象图形学报","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.18178/joig.11.4.359-366","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0

摘要

地标检索(Landmark retrieval)是一种在海量图像数据库中搜索与查询照片相似的地标图像的方法,多年来一直受到人们的广泛关注。尽管如此,快速准确地找到地标仍然面临着一些独特的挑战。为了应对这些挑战,我们提出了一个深度学习模型,称为空间金字塔注意力网络(SPA)。该网络是一个端到端的卷积网络,包含一个空间金字塔关注层,该层对输入图像进行编码,利用空间金字塔结构根据区域特征的相对空间独特性来突出区域特征。然后通过聚合这些区域特征生成图像描述符。通过对牛津5k、巴黎6k和Landmark-100等基准数据集的实验,我们提出的SPA模型在牛津数据集、巴黎数据集和Landmark-100数据集上的平均精度(mAP)分别达到85.3%、89.6%和80.4%,优于现有的最先进的深度图像检索模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Spatial Pyramid Attention Enhanced Visual Descriptors for Landmark Retrieval
Landmark retrieval, which aims to search for landmark images similar to a query photo within a massive image database, has received considerable attention for many years. Despite this, finding landmarks quickly and accurately still presents some unique challenges. To tackle these challenges, we present a deep learning model, called the Spatial-Pyramid Attention network (SPA). This network is an end-to-end convolutional network, incorporating a spatial-pyramid attention layer that encodes the input image, leveraging the spatial pyramid structure to highlight regional features based on their relative spatial distinctiveness. An image descriptor is then generated by aggregating these regional features. According to our experiments on benchmark datasets including Oxford5k, Paris6k, and Landmark-100, our proposed model, SPA, achieves mean Average Precision (mAP) accuracy of 85.3% with the Oxford dataset, 89.6% with the Paris dataset, and 80.4% in the Landmark-100 dataset, outperforming existing state-of-theart deep image retrieval models.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
中国图象图形学报
中国图象图形学报 Computer Science-Computer Graphics and Computer-Aided Design
CiteScore
1.20
自引率
0.00%
发文量
6776
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信