Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval最新文献_第5页

Industrial Applications of Image Recognition and Retrieval Technologies for Public Safety and IT Services 图像识别和检索技术在公共安全和IT服务中的工业应用

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3210492

Tomokazu Murakami

{"title":"Industrial Applications of Image Recognition and Retrieval Technologies for Public Safety and IT Services","authors":"Tomokazu Murakami","doi":"10.1145/3206025.3210492","DOIUrl":"https://doi.org/10.1145/3206025.3210492","url":null,"abstract":"Hitachi has a wide variety of technologies ranging from systems for infrastructure to IT platforms such as railway management systems, water supply operation systems, manufacturing management systems for factories, surveillance cameras and monitoring systems, rolling stocks, power plants, servers, storages, data centers, and various IT systems for governments and companies. The research and development group of Hitachi is developing video analytics and other media processing techniques and applying them to various products and solutions with business divisions for such as public safety, productivity improvement of factories and other IT applications. In this talk, I would like to introduce some of the products, solutions and research topics in Hitachi which video analytics and image retrieval techniques are applied. These include an image search system for retrieving public registered design graphics, a person detection and tracking function for video surveillance system and our activities and results in TRECVID 2017. In each cases, we integrated our original high speed image search database and deep learning based image recognition technique. Through these use cases, I would like to present how image recognition and retrieval technologies are practically utilized to industrial products and solutions and contributing to the improvement of social welfare.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"139 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127485440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Simple Score Following System for Music Ensembles Using Chroma and Dynamic Time Warping 一个简单的乐谱跟踪系统的音乐合奏使用色度和动态时间扭曲

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206090

Po-Wen Chou, Fu-Neng Lin, Keh-Ning Chang, Herng-Yow Chen

{"title":"A Simple Score Following System for Music Ensembles Using Chroma and Dynamic Time Warping","authors":"Po-Wen Chou, Fu-Neng Lin, Keh-Ning Chang, Herng-Yow Chen","doi":"10.1145/3206025.3206090","DOIUrl":"https://doi.org/10.1145/3206025.3206090","url":null,"abstract":"It is disruptive for instrumentalists to turn the page of music sheet when they are playing instruments. The purpose of this study is to investigate how real-time music score alignment can serve as a tool for a computer-assisted page turner. We proposed a simple system which can be set up easily and quickly for use to solve the problem. The framework of the system has two parts: off-line preprocessing stage and online alignment stage. In the first stage, the system extracts chroma feature vectors from the reference recording. In the second stage, the system receives audio signals of live performance and extracts chroma feature vectors from them. Finally, the system uses Dynamic Time Warping (DTW) to find an alignment between those two sets of chroma feature vectors to mark the current measure of the score. The prototype system was evaluated by musicians in different music ensembles like string quartet and orchestra. Most musicians agreed that the system is helpful and can indicate the current measure of a live performance correctly. Some musicians, however, disagreed that the system turned the page at right time. The user survey showed that the best timing for page turning is user-dependent because it is highly to do with musicians' sight reading skills and speed.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123737101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Cross-Modal Retrieval Using Deep De-correlated Subspace Ranking Hashing 基于深度去相关子空间排序哈希的跨模态检索

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206066

Kevin Joslyn, Kai Li, K. Hua

{"title":"Cross-Modal Retrieval Using Deep De-correlated Subspace Ranking Hashing","authors":"Kevin Joslyn, Kai Li, K. Hua","doi":"10.1145/3206025.3206066","DOIUrl":"https://doi.org/10.1145/3206025.3206066","url":null,"abstract":"Cross-modal hashing has become a popular research topic in recent years due to the efficiency of storing and retrieving high-dimensional multimodal data represented by compact binary codes. While most cross-modal hash functions use binary space partitioning functions (e.g. the sign function), our method uses ranking-based hashing, which is based on numerically stable and scale-invariant rank correlation measures. In this paper, we propose a novel deep learning architecture called Deep De-correlated Subspace Ranking Hashing (DDSRH) that uses feature-ranking methods to determine the hash codes for the image and text modalities in a common hamming space. Specifically, DDSRH learns a set of de-correlated nonlinear subspaces on which to project the original features, so that the hash code can be determined by the relative ordering of projected feature values in a given optimized subspace. The network relies upon a pre-trained deep feature learning network for each modality, and a hashing network responsible for optimizing the hash codes based on the known similarity of the training image-text pairs. Our proposed method includes both architectural and mathematical techniques designed specifically for ranking-based hashing in order to achieve de-correlation between the bits, bit balancing, and quantization. Finally, through extensive experimental studies on two widely-used multimodal datasets, we show that the combination of these techniques can achieve state-of the-art performance on several benchmarks.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123507622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Modal-adversarial Semantic Learning Network for Extendable Cross-modal Retrieval 可扩展跨模态检索的模态对抗语义学习网络

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206033

Xing Xu, Jingkuan Song, Huimin Lu, Yang Yang, Fumin Shen, Zi Huang

{"title":"Modal-adversarial Semantic Learning Network for Extendable Cross-modal Retrieval","authors":"Xing Xu, Jingkuan Song, Huimin Lu, Yang Yang, Fumin Shen, Zi Huang","doi":"10.1145/3206025.3206033","DOIUrl":"https://doi.org/10.1145/3206025.3206033","url":null,"abstract":"Cross-modal retrieval, e.g., using an image query to search related text and vice-versa, has become a highlighted research topic, to provide flexible retrieval experience across multi-modal data. Existing approaches usually consider the so-called non-extendable cross-modal retrieval task. In this task, they learn a common latent subspace from a source set containing labeled instances of image-text pairs and then generate common representation for the instances in a target set to perform cross-modal matching. However, these method may not generalize well when the instances of the target set contains unseen classes since the instances of both the source and target set are assumed to share the same range of classes in the non-extensive cross-modal retrieval task. In this paper, we consider a more practical issue of extendable cross-modal retrieval task where instances in source and target set have disjoint classes. We propose a novel framework, termed Modal-adversarial Semantic Learning Network (MASLN), to tackle the limitation of existing methods on this practical task. Specifically, the proposed MASLN consists two subnetworks of cross-modal reconstruction and modal-adversarial semantic learning. The former minimizes the cross-modal distribution discrepancy by reconstructing each modality data mutually, with the guidelines of class embeddings as side information in the reconstruction procedure. The latter generates semantic representation to be indiscriminative for modalities, while to distinguish the modalities from the common representation via an adversarial learning mechanism. The two subnetworks are jointly trained to enhance the cross-modal semantic consistency in the learned common subspace and the knowledge transfer to instances in the target set. Comprehensive experiment on three widely-used multi-modal datasets show its effectiveness and robustness on both non-extendable and extendable cross-modal retrieval task.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"179 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115566296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Session details: Demonstration Session 会话详细信息:演示会话

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3252934

K. Wu

引用次数: 0

Face Retrieval Framework Relying on User's Visual Memory 基于用户视觉记忆的人脸检索框架

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206038

Yugo Sato, Tsukasa Fukusato, S. Morishima

引用次数: 4

Objects, Relationships, and Context in Visual Data 可视化数据中的对象、关系和上下文

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3210496

Hanwang Zhang, Qianru Sun

引用次数: 0

VisLoiter+

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206091

Maguell L. T. L. Sandifort, Jianquan Liu, Shoji Nishimura, Wolfgang Hürst

引用次数: 3

Image Selection in Photo Albums 照片相册中的图像选择

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206077

Dmitry Kuzovkin, T. Pouli, R. Cozot, O. Meur, J. Kervec, K. Bouatouch

引用次数: 10

Considering Documents in Lifelog Information Retrieval 对生活日志信息检索中文献的思考

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206081

Rashmi Gupta

引用次数: 3