基于深度协同哈希的跨模态声像检索

2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT) Pub Date : 2020-11-01 DOI:10.1109/ISCTT51595.2020.00041

Hanxiao Xu

{"title":"基于深度协同哈希的跨模态声像检索","authors":"Hanxiao Xu","doi":"10.1109/ISCTT51595.2020.00041","DOIUrl":null,"url":null,"abstract":"In recent years, with the development of deep learning, cross modal sound image retrieval has made some progress. However, there are still some bottlenecks in the existing cross-modal audio and image retrieval methods: 1. How to establish an effective correlation between voice and image to improve the retrieval accuracy; 2. How to reduce the storage of large-scale cross-modal data and accelerate the retrieval speed. In order to solve the above problems, this paper proposes a new deep collaborative hashing cross modal audio image retrieval method (ssch), which can generate hash codes with low storage capacity and fast retrieval. In particular, ssch can use the similarity of deep features to establish the semantic relationship between speech and image, and ssch method can fuse image feature vector and audio feature vector to learn hash code cooperatively. In addition, for hash code learning, our method tries to maintain the semantic similarity of binary code and reduce the information loss generated by binary code. Experimental results show that ssch algorithm has better retrieval performance than other advanced cross modal retrieval methods.","PeriodicalId":178054,"journal":{"name":"2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Cross-Modal Sound-Image Retrieval Based on Deep Collaborative Hashing\",\"authors\":\"Hanxiao Xu\",\"doi\":\"10.1109/ISCTT51595.2020.00041\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, with the development of deep learning, cross modal sound image retrieval has made some progress. However, there are still some bottlenecks in the existing cross-modal audio and image retrieval methods: 1. How to establish an effective correlation between voice and image to improve the retrieval accuracy; 2. How to reduce the storage of large-scale cross-modal data and accelerate the retrieval speed. In order to solve the above problems, this paper proposes a new deep collaborative hashing cross modal audio image retrieval method (ssch), which can generate hash codes with low storage capacity and fast retrieval. In particular, ssch can use the similarity of deep features to establish the semantic relationship between speech and image, and ssch method can fuse image feature vector and audio feature vector to learn hash code cooperatively. In addition, for hash code learning, our method tries to maintain the semantic similarity of binary code and reduce the information loss generated by binary code. Experimental results show that ssch algorithm has better retrieval performance than other advanced cross modal retrieval methods.\",\"PeriodicalId\":178054,\"journal\":{\"name\":\"2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCTT51595.2020.00041\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCTT51595.2020.00041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

近年来，随着深度学习的发展，跨模态声音图像检索取得了一定的进展。然而，现有的跨模态音频和图像检索方法仍然存在一些瓶颈:1。如何在语音和图像之间建立有效的相关性，提高检索精度;2. 如何减少大规模跨模态数据的存储，加快检索速度。为了解决上述问题，本文提出了一种新的深度协同哈希跨模态音频图像检索方法(ssch)，该方法可以生成存储容量小、检索速度快的哈希码。特别是，ssch可以利用深度特征的相似性来建立语音和图像之间的语义关系，ssch方法可以融合图像特征向量和音频特征向量来协同学习哈希码。此外，对于哈希码学习，我们的方法尽量保持二进制码的语义相似度，减少二进制码产生的信息损失。实验结果表明，该算法比其他先进的交叉模态检索方法具有更好的检索性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cross-Modal Sound-Image Retrieval Based on Deep Collaborative Hashing

In recent years, with the development of deep learning, cross modal sound image retrieval has made some progress. However, there are still some bottlenecks in the existing cross-modal audio and image retrieval methods: 1. How to establish an effective correlation between voice and image to improve the retrieval accuracy; 2. How to reduce the storage of large-scale cross-modal data and accelerate the retrieval speed. In order to solve the above problems, this paper proposes a new deep collaborative hashing cross modal audio image retrieval method (ssch), which can generate hash codes with low storage capacity and fast retrieval. In particular, ssch can use the similarity of deep features to establish the semantic relationship between speech and image, and ssch method can fuse image feature vector and audio feature vector to learn hash code cooperatively. In addition, for hash code learning, our method tries to maintain the semantic similarity of binary code and reduce the information loss generated by binary code. Experimental results show that ssch algorithm has better retrieval performance than other advanced cross modal retrieval methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT)

自引率

0.00%

发文量