Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval最新文献

筛选
英文 中文
Industrial Applications of Image Recognition and Retrieval Technologies for Public Safety and IT Services 图像识别和检索技术在公共安全和IT服务中的工业应用
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3210492
Tomokazu Murakami
{"title":"Industrial Applications of Image Recognition and Retrieval Technologies for Public Safety and IT Services","authors":"Tomokazu Murakami","doi":"10.1145/3206025.3210492","DOIUrl":"https://doi.org/10.1145/3206025.3210492","url":null,"abstract":"Hitachi has a wide variety of technologies ranging from systems for infrastructure to IT platforms such as railway management systems, water supply operation systems, manufacturing management systems for factories, surveillance cameras and monitoring systems, rolling stocks, power plants, servers, storages, data centers, and various IT systems for governments and companies. The research and development group of Hitachi is developing video analytics and other media processing techniques and applying them to various products and solutions with business divisions for such as public safety, productivity improvement of factories and other IT applications. In this talk, I would like to introduce some of the products, solutions and research topics in Hitachi which video analytics and image retrieval techniques are applied. These include an image search system for retrieving public registered design graphics, a person detection and tracking function for video surveillance system and our activities and results in TRECVID 2017. In each cases, we integrated our original high speed image search database and deep learning based image recognition technique. Through these use cases, I would like to present how image recognition and retrieval technologies are practically utilized to industrial products and solutions and contributing to the improvement of social welfare.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"139 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127485440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Simple Score Following System for Music Ensembles Using Chroma and Dynamic Time Warping 一个简单的乐谱跟踪系统的音乐合奏使用色度和动态时间扭曲
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206090
Po-Wen Chou, Fu-Neng Lin, Keh-Ning Chang, Herng-Yow Chen
{"title":"A Simple Score Following System for Music Ensembles Using Chroma and Dynamic Time Warping","authors":"Po-Wen Chou, Fu-Neng Lin, Keh-Ning Chang, Herng-Yow Chen","doi":"10.1145/3206025.3206090","DOIUrl":"https://doi.org/10.1145/3206025.3206090","url":null,"abstract":"It is disruptive for instrumentalists to turn the page of music sheet when they are playing instruments. The purpose of this study is to investigate how real-time music score alignment can serve as a tool for a computer-assisted page turner. We proposed a simple system which can be set up easily and quickly for use to solve the problem. The framework of the system has two parts: off-line preprocessing stage and online alignment stage. In the first stage, the system extracts chroma feature vectors from the reference recording. In the second stage, the system receives audio signals of live performance and extracts chroma feature vectors from them. Finally, the system uses Dynamic Time Warping (DTW) to find an alignment between those two sets of chroma feature vectors to mark the current measure of the score. The prototype system was evaluated by musicians in different music ensembles like string quartet and orchestra. Most musicians agreed that the system is helpful and can indicate the current measure of a live performance correctly. Some musicians, however, disagreed that the system turned the page at right time. The user survey showed that the best timing for page turning is user-dependent because it is highly to do with musicians' sight reading skills and speed.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123737101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Cross-Modal Retrieval Using Deep De-correlated Subspace Ranking Hashing 基于深度去相关子空间排序哈希的跨模态检索
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206066
Kevin Joslyn, Kai Li, K. Hua
{"title":"Cross-Modal Retrieval Using Deep De-correlated Subspace Ranking Hashing","authors":"Kevin Joslyn, Kai Li, K. Hua","doi":"10.1145/3206025.3206066","DOIUrl":"https://doi.org/10.1145/3206025.3206066","url":null,"abstract":"Cross-modal hashing has become a popular research topic in recent years due to the efficiency of storing and retrieving high-dimensional multimodal data represented by compact binary codes. While most cross-modal hash functions use binary space partitioning functions (e.g. the sign function), our method uses ranking-based hashing, which is based on numerically stable and scale-invariant rank correlation measures. In this paper, we propose a novel deep learning architecture called Deep De-correlated Subspace Ranking Hashing (DDSRH) that uses feature-ranking methods to determine the hash codes for the image and text modalities in a common hamming space. Specifically, DDSRH learns a set of de-correlated nonlinear subspaces on which to project the original features, so that the hash code can be determined by the relative ordering of projected feature values in a given optimized subspace. The network relies upon a pre-trained deep feature learning network for each modality, and a hashing network responsible for optimizing the hash codes based on the known similarity of the training image-text pairs. Our proposed method includes both architectural and mathematical techniques designed specifically for ranking-based hashing in order to achieve de-correlation between the bits, bit balancing, and quantization. Finally, through extensive experimental studies on two widely-used multimodal datasets, we show that the combination of these techniques can achieve state-of the-art performance on several benchmarks.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123507622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Modal-adversarial Semantic Learning Network for Extendable Cross-modal Retrieval 可扩展跨模态检索的模态对抗语义学习网络
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206033
Xing Xu, Jingkuan Song, Huimin Lu, Yang Yang, Fumin Shen, Zi Huang
{"title":"Modal-adversarial Semantic Learning Network for Extendable Cross-modal Retrieval","authors":"Xing Xu, Jingkuan Song, Huimin Lu, Yang Yang, Fumin Shen, Zi Huang","doi":"10.1145/3206025.3206033","DOIUrl":"https://doi.org/10.1145/3206025.3206033","url":null,"abstract":"Cross-modal retrieval, e.g., using an image query to search related text and vice-versa, has become a highlighted research topic, to provide flexible retrieval experience across multi-modal data. Existing approaches usually consider the so-called non-extendable cross-modal retrieval task. In this task, they learn a common latent subspace from a source set containing labeled instances of image-text pairs and then generate common representation for the instances in a target set to perform cross-modal matching. However, these method may not generalize well when the instances of the target set contains unseen classes since the instances of both the source and target set are assumed to share the same range of classes in the non-extensive cross-modal retrieval task. In this paper, we consider a more practical issue of extendable cross-modal retrieval task where instances in source and target set have disjoint classes. We propose a novel framework, termed Modal-adversarial Semantic Learning Network (MASLN), to tackle the limitation of existing methods on this practical task. Specifically, the proposed MASLN consists two subnetworks of cross-modal reconstruction and modal-adversarial semantic learning. The former minimizes the cross-modal distribution discrepancy by reconstructing each modality data mutually, with the guidelines of class embeddings as side information in the reconstruction procedure. The latter generates semantic representation to be indiscriminative for modalities, while to distinguish the modalities from the common representation via an adversarial learning mechanism. The two subnetworks are jointly trained to enhance the cross-modal semantic consistency in the learned common subspace and the knowledge transfer to instances in the target set. Comprehensive experiment on three widely-used multi-modal datasets show its effectiveness and robustness on both non-extendable and extendable cross-modal retrieval task.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"179 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115566296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Session details: Demonstration Session 会话详细信息:演示会话
K. Wu
{"title":"Session details: Demonstration Session","authors":"K. Wu","doi":"10.1145/3252934","DOIUrl":"https://doi.org/10.1145/3252934","url":null,"abstract":"","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129404170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Face Retrieval Framework Relying on User's Visual Memory 基于用户视觉记忆的人脸检索框架
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206038
Yugo Sato, Tsukasa Fukusato, S. Morishima
{"title":"Face Retrieval Framework Relying on User's Visual Memory","authors":"Yugo Sato, Tsukasa Fukusato, S. Morishima","doi":"10.1145/3206025.3206038","DOIUrl":"https://doi.org/10.1145/3206025.3206038","url":null,"abstract":"This paper presents an interactive face retrieval framework for clarifying an image representation envisioned by a user. Our system is designed for a situation in which the user wishes to find a person but has only visual memory of the person. We address a critical challenge of image retrieval across the user's inputs. Instead of target-specific information, the user can select several images (or a single image) that are similar to an impression of the target person the user wishes to search for. Based on the user's selection, our proposed system automatically updates a deep convolutional neural network. By interactively repeating these process (human-in-the-loop optimization), the system can reduce the gap between human-based similarities and computer-based similarities and estimate the target image representation. We ran user studies with 10 subjects on a public database and confirmed that the proposed framework is effective for clarifying the image representation envisioned by the user easily and quickly.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129849388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Objects, Relationships, and Context in Visual Data 可视化数据中的对象、关系和上下文
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3210496
Hanwang Zhang, Qianru Sun
{"title":"Objects, Relationships, and Context in Visual Data","authors":"Hanwang Zhang, Qianru Sun","doi":"10.1145/3206025.3210496","DOIUrl":"https://doi.org/10.1145/3206025.3210496","url":null,"abstract":"For decades, we are interested in detecting objects and classifying them into a fixed vocabulary of lexicon. With the maturity of these low-level vision solutions, we are hunger for a higher-level representation of the visual data, so as to extract visual knowledge rather than merely bags of visual entities, allowing machines to reason about human-level decision-making and even manipulate the visual data at the pixel-level. In this tutorial, we will introduce a various of machine learning techniques for modeling visual relationships (e.g., subject-predicate-object triplet detection) and contextual generative models (e.g., generating photo-realistic images using conditional generative adversarial networks). In particular, we plan to start from fundamental theories on object detection, relationship detection, generative adversarial networks, to more advanced topics on referring expression visual grounding, pose guided person image generation, and context based image inpainting.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130283999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VisLoiter+
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206091
Maguell L. T. L. Sandifort, Jianquan Liu, Shoji Nishimura, Wolfgang Hürst
{"title":"VisLoiter+","authors":"Maguell L. T. L. Sandifort, Jianquan Liu, Shoji Nishimura, Wolfgang Hürst","doi":"10.1145/3206025.3206091","DOIUrl":"https://doi.org/10.1145/3206025.3206091","url":null,"abstract":"It is very difficult to fully automate the detection of loitering behavior in video surveillance, therefore humans are often required for monitoring. Alternatively, we could provide a list of potential loiterer candidates for a final yes/no judgment of a human operator. Our system, VisLoiter+, realizes this idea with a unique, user-friendly interface and by employing an entropy model for improved loitering analysis. Rather than using only frequency of appearance, we expand the loiter analysis with new methods measuring the amount of person movements across multiple camera views. The interface gives an overview of loiterer candidates to show their behavior at a glance, complemented by a lightweight video playback for further details about why a candidate was selected. We demonstrate that our system outperforms state-of-the-art solutions using real-life data sets.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116220634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Image Selection in Photo Albums 照片相册中的图像选择
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206077
Dmitry Kuzovkin, T. Pouli, R. Cozot, O. Meur, J. Kervec, K. Bouatouch
{"title":"Image Selection in Photo Albums","authors":"Dmitry Kuzovkin, T. Pouli, R. Cozot, O. Meur, J. Kervec, K. Bouatouch","doi":"10.1145/3206025.3206077","DOIUrl":"https://doi.org/10.1145/3206025.3206077","url":null,"abstract":"The selection of the best photos in personal albums is a task that is often faced by photographers. This task can become laborious when the photo collection is large and it contains multiple similar photos. Recent advances on image aesthetics and photo importance evaluation has led to the creation of different metrics for automatically assessing a given image. However, these metrics are intended for the independent assessment of an image, without considering the possible context implicitly present within photo albums. In this work, we perform a user study for assessing how users select photos when provided with a complete photo album---a task that better reflects how users may review their personal photos and collections. Using the data provided by our study, we evaluate how existing state-of-the-art photo assessment methods perform relative to user selection, focusing in particular on deep learning based approaches. Finally, we explore a recent framework for adapting independent image scores to collections and evaluate in which scenarios such an adaptation can prove beneficial.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126358561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Considering Documents in Lifelog Information Retrieval 对生活日志信息检索中文献的思考
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206081
Rashmi Gupta
{"title":"Considering Documents in Lifelog Information Retrieval","authors":"Rashmi Gupta","doi":"10.1145/3206025.3206081","DOIUrl":"https://doi.org/10.1145/3206025.3206081","url":null,"abstract":"Lifelogging is a research topic that is receiving increasing attention and although lifelog research has progressed in recent years, the concept of what represents a document in lifelog retrieval has not yet been sufficiently explored. Hence, the generation of multimodal lifelog documents is a fundamental concept that must be addressed. In this paper, I introduce my general perspective on generating documents in lifelogging and reflect on learnings from collecting multimodal lifelog data from a number of participants in a study on lifelog data organization. In addition, the main motivation behind document generation is proposed and the challenges faced while collecting data and generating documents are discussed in detail. Finally, a process for organizing the documents in lifelog data retrieval is proposed, which I intend to follow in my PhD research.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127211566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信