{"title":"Image Recall on Image-Text Intertwined Lifelogs","authors":"Tzu-Hsuan Chu, Hen-Hsen Huang, Hsin-Hsi Chen","doi":"10.1145/3350546.3352555","DOIUrl":null,"url":null,"abstract":"People engage in lifelogging by taking photos with cameras and cellphones anytime anywhere and share the photos, intertwined with captions or descriptions, on social media platforms. The imagetext intertwined data provides richer information for image recall. When images cannot keep the complete information, the textual information is a complement to describe the life experiences under the photos. This work proposes a multimodal retrieval model for image recall in image-text intertwined lifelogs. Our Attentive Image-Story model combines an Image model, which transfers visual information and textual information to a single representation space, and a Story model, which captures text-based contextual information, with an attention mechanism to reduce the semantic gap between visual and textual information. Experimental results show our model outperforms a state-of-the-art image-based retrieval model and the image/text hybrid system.CCS CONCEPTS• Information systems → Multimedia and multimodal retrieval; • Computing methodologies → Natural language processing; Image representations.","PeriodicalId":171168,"journal":{"name":"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3350546.3352555","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
People engage in lifelogging by taking photos with cameras and cellphones anytime anywhere and share the photos, intertwined with captions or descriptions, on social media platforms. The imagetext intertwined data provides richer information for image recall. When images cannot keep the complete information, the textual information is a complement to describe the life experiences under the photos. This work proposes a multimodal retrieval model for image recall in image-text intertwined lifelogs. Our Attentive Image-Story model combines an Image model, which transfers visual information and textual information to a single representation space, and a Story model, which captures text-based contextual information, with an attention mechanism to reduce the semantic gap between visual and textual information. Experimental results show our model outperforms a state-of-the-art image-based retrieval model and the image/text hybrid system.CCS CONCEPTS• Information systems → Multimedia and multimodal retrieval; • Computing methodologies → Natural language processing; Image representations.