{"title":"基于深度协同哈希的跨模态声像检索","authors":"Hanxiao Xu","doi":"10.1109/ISCTT51595.2020.00041","DOIUrl":null,"url":null,"abstract":"In recent years, with the development of deep learning, cross modal sound image retrieval has made some progress. However, there are still some bottlenecks in the existing cross-modal audio and image retrieval methods: 1. How to establish an effective correlation between voice and image to improve the retrieval accuracy; 2. How to reduce the storage of large-scale cross-modal data and accelerate the retrieval speed. In order to solve the above problems, this paper proposes a new deep collaborative hashing cross modal audio image retrieval method (ssch), which can generate hash codes with low storage capacity and fast retrieval. In particular, ssch can use the similarity of deep features to establish the semantic relationship between speech and image, and ssch method can fuse image feature vector and audio feature vector to learn hash code cooperatively. In addition, for hash code learning, our method tries to maintain the semantic similarity of binary code and reduce the information loss generated by binary code. Experimental results show that ssch algorithm has better retrieval performance than other advanced cross modal retrieval methods.","PeriodicalId":178054,"journal":{"name":"2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Cross-Modal Sound-Image Retrieval Based on Deep Collaborative Hashing\",\"authors\":\"Hanxiao Xu\",\"doi\":\"10.1109/ISCTT51595.2020.00041\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, with the development of deep learning, cross modal sound image retrieval has made some progress. However, there are still some bottlenecks in the existing cross-modal audio and image retrieval methods: 1. How to establish an effective correlation between voice and image to improve the retrieval accuracy; 2. How to reduce the storage of large-scale cross-modal data and accelerate the retrieval speed. In order to solve the above problems, this paper proposes a new deep collaborative hashing cross modal audio image retrieval method (ssch), which can generate hash codes with low storage capacity and fast retrieval. In particular, ssch can use the similarity of deep features to establish the semantic relationship between speech and image, and ssch method can fuse image feature vector and audio feature vector to learn hash code cooperatively. In addition, for hash code learning, our method tries to maintain the semantic similarity of binary code and reduce the information loss generated by binary code. Experimental results show that ssch algorithm has better retrieval performance than other advanced cross modal retrieval methods.\",\"PeriodicalId\":178054,\"journal\":{\"name\":\"2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCTT51595.2020.00041\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCTT51595.2020.00041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Cross-Modal Sound-Image Retrieval Based on Deep Collaborative Hashing
In recent years, with the development of deep learning, cross modal sound image retrieval has made some progress. However, there are still some bottlenecks in the existing cross-modal audio and image retrieval methods: 1. How to establish an effective correlation between voice and image to improve the retrieval accuracy; 2. How to reduce the storage of large-scale cross-modal data and accelerate the retrieval speed. In order to solve the above problems, this paper proposes a new deep collaborative hashing cross modal audio image retrieval method (ssch), which can generate hash codes with low storage capacity and fast retrieval. In particular, ssch can use the similarity of deep features to establish the semantic relationship between speech and image, and ssch method can fuse image feature vector and audio feature vector to learn hash code cooperatively. In addition, for hash code learning, our method tries to maintain the semantic similarity of binary code and reduce the information loss generated by binary code. Experimental results show that ssch algorithm has better retrieval performance than other advanced cross modal retrieval methods.