Hariprasath Govindarajan, P. Lindskog, Dennis Lundström, Amanda Olmin, Jacob Roll, F. Lindsten
{"title":"Self-Supervised Representation Learning for Content Based Image Retrieval of Complex Scenes","authors":"Hariprasath Govindarajan, P. Lindskog, Dennis Lundström, Amanda Olmin, Jacob Roll, F. Lindsten","doi":"10.1109/ivworkshops54471.2021.9669246","DOIUrl":null,"url":null,"abstract":"Although Content Based Image Retrieval (CBIR) is an active research field, application to images simultaneously containing multiple objects has received limited research inter- est. For such complex images, it is difficult to precisely convey the query intention, to encode all the image aspects into one compact global feature representation and to unambiguously define label similarity or dissimilarity. Motivated by the recent success on many visual benchmark tasks, we propose a self- supervised method to train a feature representation learning model. We propose usage of multiple query images, and use an attention based architecture to extract features from diverse image aspects that benefits from this. The method shows promising performance on road scene datasets, and, consistently improves when multiple query images are used instead of a single query image.","PeriodicalId":256905,"journal":{"name":"2021 IEEE Intelligent Vehicles Symposium Workshops (IV Workshops)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Intelligent Vehicles Symposium Workshops (IV Workshops)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ivworkshops54471.2021.9669246","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Although Content Based Image Retrieval (CBIR) is an active research field, application to images simultaneously containing multiple objects has received limited research inter- est. For such complex images, it is difficult to precisely convey the query intention, to encode all the image aspects into one compact global feature representation and to unambiguously define label similarity or dissimilarity. Motivated by the recent success on many visual benchmark tasks, we propose a self- supervised method to train a feature representation learning model. We propose usage of multiple query images, and use an attention based architecture to extract features from diverse image aspects that benefits from this. The method shows promising performance on road scene datasets, and, consistently improves when multiple query images are used instead of a single query image.