图像检索的深度学习系统

S. S. Rao, Shahid Ikram, Parashara Ramesh
{"title":"图像检索的深度学习系统","authors":"S. S. Rao, Shahid Ikram, Parashara Ramesh","doi":"10.1109/Indo-TaiwanICAN48429.2020.9181344","DOIUrl":null,"url":null,"abstract":"In the modern era of digital photography and advent of smartphones, millions of images are generated every day and they represent precious moments and events of our lives. As we continue to add images to our digital storehouse, the management and access handling of the images becomes a daunting task and we lose track unless properly managed. We are in essential need of a tool that can fetch images based on a word or a description. In this paper, we try to build a solution that retrieves relevant images from a pool, based on the description by looking at the content of the image. The model is based on deep neural network architecture and attending to relevant parts of the image. The algorithm takes a sentence or word as input and obtains the top images which are relevant to the caption. We obtain the representation of the sentence and image in a higher dimension, which enables us to compare the two and find the similarity level of both to decide on the relevance. We have conducted various experiments to improve the representation of the image and the caption obtained in the latent space for better correlation, for e.g. use of bidirectional sequence models for better textual representation, use of various baseline convolution-based stacks for better image representation. We have also tried to incorporate the attention mechanism to focus on only the relevant parts of the image and the sentence, thereby enhancing the correlation between the two spaces.","PeriodicalId":171125,"journal":{"name":"2020 Indo – Taiwan 2nd International Conference on Computing, Analytics and Networks (Indo-Taiwan ICAN)","volume":"30 10","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep Learning System for Image Retrieval\",\"authors\":\"S. S. Rao, Shahid Ikram, Parashara Ramesh\",\"doi\":\"10.1109/Indo-TaiwanICAN48429.2020.9181344\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the modern era of digital photography and advent of smartphones, millions of images are generated every day and they represent precious moments and events of our lives. As we continue to add images to our digital storehouse, the management and access handling of the images becomes a daunting task and we lose track unless properly managed. We are in essential need of a tool that can fetch images based on a word or a description. In this paper, we try to build a solution that retrieves relevant images from a pool, based on the description by looking at the content of the image. The model is based on deep neural network architecture and attending to relevant parts of the image. The algorithm takes a sentence or word as input and obtains the top images which are relevant to the caption. We obtain the representation of the sentence and image in a higher dimension, which enables us to compare the two and find the similarity level of both to decide on the relevance. We have conducted various experiments to improve the representation of the image and the caption obtained in the latent space for better correlation, for e.g. use of bidirectional sequence models for better textual representation, use of various baseline convolution-based stacks for better image representation. We have also tried to incorporate the attention mechanism to focus on only the relevant parts of the image and the sentence, thereby enhancing the correlation between the two spaces.\",\"PeriodicalId\":171125,\"journal\":{\"name\":\"2020 Indo – Taiwan 2nd International Conference on Computing, Analytics and Networks (Indo-Taiwan ICAN)\",\"volume\":\"30 10\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Indo – Taiwan 2nd International Conference on Computing, Analytics and Networks (Indo-Taiwan ICAN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/Indo-TaiwanICAN48429.2020.9181344\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Indo – Taiwan 2nd International Conference on Computing, Analytics and Networks (Indo-Taiwan ICAN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Indo-TaiwanICAN48429.2020.9181344","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在数码摄影和智能手机出现的现代时代,每天都会产生数百万张图像,它们代表了我们生活中的宝贵时刻和事件。随着我们继续向数字仓库添加图像,图像的管理和访问处理成为一项艰巨的任务,除非管理得当,否则我们会失去跟踪。我们非常需要一种工具,可以根据一个词或一个描述来获取图像。在本文中,我们试图构建一个解决方案,通过查看图像的内容,基于描述从池中检索相关图像。该模型基于深度神经网络架构,关注图像的相关部分。该算法以一个句子或单词作为输入,获取与标题相关的顶部图像。我们获得了句子和图像在更高维度上的表示,这使我们能够比较两者,并找到两者的相似程度,以确定相关性。我们已经进行了各种实验来改进图像的表示和在潜在空间中获得的标题以获得更好的相关性,例如使用双向序列模型来更好地表示文本,使用各种基于基线卷积的堆栈来更好地表示图像。我们还尝试将注意力机制整合到图像和句子的相关部分,从而增强两个空间之间的相关性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Deep Learning System for Image Retrieval
In the modern era of digital photography and advent of smartphones, millions of images are generated every day and they represent precious moments and events of our lives. As we continue to add images to our digital storehouse, the management and access handling of the images becomes a daunting task and we lose track unless properly managed. We are in essential need of a tool that can fetch images based on a word or a description. In this paper, we try to build a solution that retrieves relevant images from a pool, based on the description by looking at the content of the image. The model is based on deep neural network architecture and attending to relevant parts of the image. The algorithm takes a sentence or word as input and obtains the top images which are relevant to the caption. We obtain the representation of the sentence and image in a higher dimension, which enables us to compare the two and find the similarity level of both to decide on the relevance. We have conducted various experiments to improve the representation of the image and the caption obtained in the latent space for better correlation, for e.g. use of bidirectional sequence models for better textual representation, use of various baseline convolution-based stacks for better image representation. We have also tried to incorporate the attention mechanism to focus on only the relevant parts of the image and the sentence, thereby enhancing the correlation between the two spaces.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信