Proceedings of the 2020 International Conference on Multimedia Retrieval最新文献

筛选
英文 中文
Visual Story Ordering with a Bidirectional Writer 使用双向书写器的视觉故事排序
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390735
Wei-Rou Lin, Hen-Hsen Huang, Hsin-Hsi Chen
{"title":"Visual Story Ordering with a Bidirectional Writer","authors":"Wei-Rou Lin, Hen-Hsen Huang, Hsin-Hsi Chen","doi":"10.1145/3372278.3390735","DOIUrl":"https://doi.org/10.1145/3372278.3390735","url":null,"abstract":"This paper introduces visual story ordering, a challenging task in which images and text are ordered in a visual story jointly. We propose a neural network model based on the reader-processor-writer architecture with a self-attention mechanism. A novel bidirectional decoder is further proposed with bidirectional beam search. Experimental results show the effectiveness of the approach. The information gained from multimodal learning is presented and discussed. We also find that the proposed embedding narrows the distance between images and their corresponding story sentences, even though we do not align the two modalities explicitly. As it addresses a general issue in generative models, the proposed bidirectional inference mechanism applies to a variety of applications.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125163424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Interactive Multimodal Retrieval System for Memory Assistant and Life Organized Support 记忆辅助和生活组织支持的交互式多模态检索系统
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3391934
Van-Luon Tran, Anh-Vu Mai-Nguyen, Trong-Dat Phan, Anh-Khoa Vo, Minh-Son Dao, K. Zettsu
{"title":"An Interactive Multimodal Retrieval System for Memory Assistant and Life Organized Support","authors":"Van-Luon Tran, Anh-Vu Mai-Nguyen, Trong-Dat Phan, Anh-Khoa Vo, Minh-Son Dao, K. Zettsu","doi":"10.1145/3372278.3391934","DOIUrl":"https://doi.org/10.1145/3372278.3391934","url":null,"abstract":"Lifelogging is known as the new trend of writing diary digitally where both the surrounding environment and personal physiological data and cognition are collected at the same time under the first perspective. Exploring and exploiting these lifelog (i.e., data created by lifelogging) can provide useful insights for human beings, including healthcare, work, entertainment, and family, to name a few. Unfortunately, having a valuable tool working on lifelog to discover these insights is still a tough challenge. To meet this requirement, we introduce an interactive multimodal retrieval system that aims to provide people with two functions, memory assistant and life organized support, with a friendly and easy-to-use web UI. The output of the former function is a video with footages expressing all instances of events people want to recall. The latter function generates a statistical report of each event so that people can have more information to balance their lifestyle. The system relies on two major algorithms that try to match keywords/phrases to images and to run a cluster-based query using a watershed-based approach.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116899089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Image Synthesis from Locally Related Texts 从本地相关文本合成图像
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390684
Tianrui Niu, Fangxiang Feng, Lingxuan Li, Xiaojie Wang
{"title":"Image Synthesis from Locally Related Texts","authors":"Tianrui Niu, Fangxiang Feng, Lingxuan Li, Xiaojie Wang","doi":"10.1145/3372278.3390684","DOIUrl":"https://doi.org/10.1145/3372278.3390684","url":null,"abstract":"Text-to-image synthesis refers to generating photo-realistic images from text descriptions. Recent works focus on generating images with complex scenes and multiple objects. However, the text inputs to these models are the only captions that always describe the most apparent object or feature of the image and detailed information (e.g. visual attributes) for regions and objects are often missing. Quantitative evaluation of generation performances is still an unsolved problem, where traditional image classification- or retrieval-based metrics fail at evaluating complex images. To address these problems, we propose to generate images conditioned on locally-related texts, i.e., descriptions of local image regions or objects instead of the whole image. Specifically, questions and answers (QAs) are chosen as locally-related texts, which makes it possible to use VQA accuracy as a new evaluation metric. The intuition is simple: higher image quality and image-text consistency (both globally and locally) can help a VQA model answer questions more correctly. We purposed VQA-GAN model with three key modules: hierarchical QA encoder, QA-conditional GAN and external VQA loss. These modules help leverage the new inputs effectively. Thorough experiments on two public VQA datasets demonstrate the effectiveness of the model and the newly proposed metric.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128360945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Reducing Response Time for Multimedia Event Processing using Domain Adaptation 利用领域自适应减少多媒体事件处理的响应时间
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390722
Asra Aslam, E. Curry
{"title":"Reducing Response Time for Multimedia Event Processing using Domain Adaptation","authors":"Asra Aslam, E. Curry","doi":"10.1145/3372278.3390722","DOIUrl":"https://doi.org/10.1145/3372278.3390722","url":null,"abstract":"The Internet of Multimedia Things (IoMT) is an emerging concept due to the large amount of multimedia data produced by sensing devices. Existing event-based systems mainly focus on scalar data, and multimedia event-based solutions are domain-specific. Multiple applications may require handling of numerous known/unknown concepts which may belong to the same/different domains with an unbounded vocabulary. Although deep neural network-based techniques are effective for image recognition, the limitation of having to train classifiers for unseen concepts will lead to an increase in the overall response-time for users. Since it is not practical to have all trained classifiers available, it is necessary to address the problem of training of classifiers on demand for unbounded vocabulary. By exploiting transfer learning based techniques, evaluations showed that the proposed framework can answer within ~0.01 min to ~30 min of response-time with accuracy ranges from 95.14% to 98.53%, even when all subscriptions are new/unknown.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121319415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Trajectory Prediction Network for Future Anticipation of Ships 船舶未来预测的轨迹预测网络
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390676
Pim Dijt, P. Mettes
{"title":"Trajectory Prediction Network for Future Anticipation of Ships","authors":"Pim Dijt, P. Mettes","doi":"10.1145/3372278.3390676","DOIUrl":"https://doi.org/10.1145/3372278.3390676","url":null,"abstract":"This work investigates the anticipation of future ship locations based on multimodal sensors. Predicting future trajectories of ships is an important component for the development of safe autonomous sailing ships on water. A core challenge towards future trajectory prediction is making sense of multiple modalities from vastly different sensors, including GPS coordinates, radar images, and charts specifying water and land regions. To that end, we propose a Trajectory Prediction Network, an end-to-end approach for trajectory anticipation based on multimodal sensors. Our approach is framed as a multi-task sequence-to-sequence network, with network components for coordinate sequences and radar images. In the network, water/land segmentations from charts are integrated as an auxiliary training objective. Since future anticipation of ships has not previously been studied from such a multimodal perspective, we introduce the Inland Shipping Dataset (ISD), a novel dataset for future anticipation of ships. Experimental evaluation on ISD shows the potential of our approach, outperforming single-modal variants and baselines from related anticipation tasks.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121333827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Multi-level Recognition on Falls from Activities of Daily Living 从日常生活活动看跌倒的多层次认识
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390702
Jiawei Li, Shutao Xia, Qianggang Ding
{"title":"Multi-level Recognition on Falls from Activities of Daily Living","authors":"Jiawei Li, Shutao Xia, Qianggang Ding","doi":"10.1145/3372278.3390702","DOIUrl":"https://doi.org/10.1145/3372278.3390702","url":null,"abstract":"The falling accident is one of the largest threats to human health, which leads to broken bones, head injury, or even death. Therefore, automatic human fall recognition is vital for the Activities of Daily Living (ADL). In this paper, we try to define multi-level computer vision tasks for the visually observed fall recognition problem and study the methods and pipeline. We make frame-level labels for the fall action on several ADL datasets to test the methods and support the analysis. While current deep-learning fall recognition methods usually work on the sequence-level input, we propose a novel Dynamic Pose Motion (DPM) representation to go a step further, which can be captured by a flexible motion extraction module. Besides, a sequence-level fall recognition pipeline is proposed, which has an explicit two-branch structure for the appearance and motion feature, and has canonical LSTM to make temporal modeling and fall prediction. Finally, while current research only makes a binary classification on the fall and ADL, we further study how to detect the start time and the end time of a fall action in a video-level task. We conduct analysis experiments and ablation studies on both the simulated and real-life fall datasets. The relabelled datasets and extensive experiments form a new baseline on the recognition of falls and ADL.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134159680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Forward and Backward Multimodal NMT for Improved Monolingual and Multilingual Cross-Modal Retrieval 改进单语和多语跨模态检索的前向和后向多模态NMT
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390674
Po-Yao (Bernie) Huang, Xiaojun Chang, Alexander Hauptmann, E. Hovy
{"title":"Forward and Backward Multimodal NMT for Improved Monolingual and Multilingual Cross-Modal Retrieval","authors":"Po-Yao (Bernie) Huang, Xiaojun Chang, Alexander Hauptmann, E. Hovy","doi":"10.1145/3372278.3390674","DOIUrl":"https://doi.org/10.1145/3372278.3390674","url":null,"abstract":"We explore methods to enrich the diversity of captions associated with pictures for learning improved visual-semantic embeddings (VSE) in cross-modal retrieval. In the spirit of \"A picture is worth a thousand words\", it would take dozens of sentences to parallel each picture's content adequately. But in fact, real-world multimodal datasets tend to provide only a few (typically, five) descriptions per image. For cross-modal retrieval, the resulting lack of diversity and coverage prevents systems from capturing the fine-grained inter-modal dependencies and intra-modal diversities in the shared VSE space. Using the fact that the encoder-decoder architectures in neural machine translation (NMT) have the capacity to enrich both monolingual and multilingual textual diversity, we propose a novel framework leveraging multimodal neural machine translation (MMT) to perform forward and backward translations based on salient visual objects to generate additional text-image pairs which enables training improved monolingual cross-modal retrieval (English-Image) and multilingual cross-modal retrieval (English-Image and German-Image) models. Experimental results show that the proposed framework can substantially and consistently improve the performance of state-of-the-art models on multiple datasets. The results also suggest that the models with multilingual VSE outperform the models with monolingual VSE.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133031306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Urban Movie Map for Walkers: Route View Synthesis using 360° Videos 城市电影地图为步行者:路线视图合成使用360°视频
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3390707
Naoki Sugimoto, Toru Okubo, K. Aizawa
{"title":"Urban Movie Map for Walkers: Route View Synthesis using 360° Videos","authors":"Naoki Sugimoto, Toru Okubo, K. Aizawa","doi":"10.1145/3372278.3390707","DOIUrl":"https://doi.org/10.1145/3372278.3390707","url":null,"abstract":"We propose a movie map for walkers based on synthesized street walking views along routes in a particular area. From the perspectives of walkers, we captured a number of omnidirectional videos along streets in the target area (1km2 around Kyoto Station). We captured a separate video for each street. We then performed simultaneous localization and mapping to obtain camera poses from key video frames in all of the videos and adjusted the coordinates based on a map of the area using reference points. To join one video to another smoothly at intersections, we identified frames of video intersection based on camera locations and visual feature matching. Finally, we generated moving route views by connecting the omnidirectional videos based on the alignment of the cameras. To improve smoothness at intersections, we generated rotational views by mixing video intersection frames from two videos. The results demonstrate that our method can precisely identify intersection frames and generate smooth connections between videos at intersections.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115241774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
ICDAR'20: Intelligent Cross-Data Analysis and Retrieval ICDAR'20:智能交叉数据分析与检索
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3388041
Minh-Son Dao, M. Fjeld, F. Biljecki, U. Yavanoglu, M. Dong
{"title":"ICDAR'20: Intelligent Cross-Data Analysis and Retrieval","authors":"Minh-Son Dao, M. Fjeld, F. Biljecki, U. Yavanoglu, M. Dong","doi":"10.1145/3372278.3388041","DOIUrl":"https://doi.org/10.1145/3372278.3388041","url":null,"abstract":"The First International Workshop on \"Intelligence Cross-Data Analytics and Retrieval\" (ICDAR'20) welcomes any theoretical and practical works on intelligence cross-data analytics and retrieval to bring the smart-sustainable society to human beings. We have witnessed the era of big data where almost any event that happens is recorded and stored either distributedly or centrally. The utmost requirement here is that data came from different sources, and various domains must be harmonically analyzed to get their insights immediately towards giving the ability to be retrieved thoroughly. These emerging requirements lead to the need for interdisciplinary and multidisciplinary contributions that address different aspects of the problem, such as data collection, storage, protection, processing, and transmission, as well as knowledge discovery, retrieval, and security and privacy. Hence, the goal of the workshop is to attract researchers and experts in the areas of multimedia information retrieval, machine learning, AI, data science, event-based processing and analysis, multimodal multimedia content analysis, lifelog data analysis, urban computing, environmental science, atmospheric science, and security and privacy to tackle the issues as mentioned earlier.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115913621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
surgXplore: Interactive Video Exploration for Endoscopy 外科探索:内窥镜交互式视频探索
Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-06-08 DOI: 10.1145/3372278.3391930
Andreas Leibetseder, Klaus Schöffmann
{"title":"surgXplore: Interactive Video Exploration for Endoscopy","authors":"Andreas Leibetseder, Klaus Schöffmann","doi":"10.1145/3372278.3391930","DOIUrl":"https://doi.org/10.1145/3372278.3391930","url":null,"abstract":"Accumulating recordings of daily conducted surgical interventions such as endoscopic procedures for the long term generates very large video archives that are both difficult to search and explore. Since physicians utilize this kind of media routinely for documentation, treatment planning or education and training, it can be considered a crucial task to make said archives manageable in regards to discovering or retrieving relevant content. We present an interactive tool including a multitude of modalities for browsing, searching and filtering medical content, demonstrating its usefulness on over 140 hours of pre-processed laparoscopic surgery videos.","PeriodicalId":158014,"journal":{"name":"Proceedings of the 2020 International Conference on Multimedia Retrieval","volume":"257 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122138640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信