{"title":"Image retrieval based on multimodality neural network and local sensitive hash","authors":"Chen Chen","doi":"10.1117/12.2680159","DOIUrl":null,"url":null,"abstract":"With the rapid development of deep convolutional neural networks, the use of deep convolutional neural networks to extract features instead of manual features has become one of the current research hotspots. However, deep convolutional neural network can not understand image features well, and there is a “semantic gap”. Contrastive Language-Image PreTraining (CLIP) model is a pre-training neural network model based on matching image and text. Use the pre-trained CLIP model to extract the high-dimensional feature vector of the image data set to be retrieved, and the Local Sensitive Hash (LSH) algorithm was used to extract the retrieval speed to complete the retrieval task based on the image content and text. Experimental results show that compared with other content-based image retrieval algorithms, the proposed algorithm can also understand the text information in the image to complete the retrieval task, and has a wider retrieval range.","PeriodicalId":201466,"journal":{"name":"Symposium on Advances in Electrical, Electronics and Computer Engineering","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Symposium on Advances in Electrical, Electronics and Computer Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2680159","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the rapid development of deep convolutional neural networks, the use of deep convolutional neural networks to extract features instead of manual features has become one of the current research hotspots. However, deep convolutional neural network can not understand image features well, and there is a “semantic gap”. Contrastive Language-Image PreTraining (CLIP) model is a pre-training neural network model based on matching image and text. Use the pre-trained CLIP model to extract the high-dimensional feature vector of the image data set to be retrieved, and the Local Sensitive Hash (LSH) algorithm was used to extract the retrieval speed to complete the retrieval task based on the image content and text. Experimental results show that compared with other content-based image retrieval algorithms, the proposed algorithm can also understand the text information in the image to complete the retrieval task, and has a wider retrieval range.