Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval最新文献

筛选
英文 中文
Semi-Automatic Retrieval of Relevant Segments from Laparoscopic Surgery Videos 腹腔镜手术视频中相关片段的半自动检索
Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3079008
Stefan Petscharnig
{"title":"Semi-Automatic Retrieval of Relevant Segments from Laparoscopic Surgery Videos","authors":"Stefan Petscharnig","doi":"10.1145/3078971.3079008","DOIUrl":"https://doi.org/10.1145/3078971.3079008","url":null,"abstract":"Over the last decades, progress in medical technology and imaging technology enabled the technique of minimally invasive surgery. In addition, multimedia technologies allow for retrospective analyses of surgeries. The accumulated videos and images allow for a speed-up in documentation, easier medical case assessment across surgeons, training young surgeons, as well as they find the usage in medical research. Considering a surgery lasting for hours of routine work, surgeons only need to see short video segments of interest to assess a case. Surgeons do not have the time to manually extract video sequences of their surgeries from their big multimedia databases as they do not have the resources for this time-consuming task. The thesis deals with the questions of how to semantically classify video frames using Convolutional Neural Networks into different semantic concepts of surgical actions and anatomical structures. In order to achieve this goal, the capabilities of predefined CNN architectures and transfer learning in the laparoscopic video domain are investigated. The results are expected to improve by domain-specific adaptation of the CNN input layers, i.e. by fusion of the image with motion and relevance information. Finally, the thesis investigates to what extent surgeons' needs are covered with the proposed extraction of relevant scenes.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"87 16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126299072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visual Descriptors in Methods for Video Hyperlinking 视频超链接方法中的视觉描述符
Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3079026
P. Galuscáková, Michal Batko, Jan Cech, Jiri Matas, David Novak, Pavel Pecina
{"title":"Visual Descriptors in Methods for Video Hyperlinking","authors":"P. Galuscáková, Michal Batko, Jan Cech, Jiri Matas, David Novak, Pavel Pecina","doi":"10.1145/3078971.3079026","DOIUrl":"https://doi.org/10.1145/3078971.3079026","url":null,"abstract":"In this paper, we survey different state-of-the-art visual processing methods and utilize them in hyperlinking. Visual information, calculated using Features Signatures, SIMILE descriptors and convolutional neural networks (CNN), is utilized as similarity between video frames and used to find similar faces, objects and setting. Visual concepts in frames are also automatically recognized and textual output of the recognition is combined with search based on subtitles and transcripts. All presented experiments were performed in the Search and Hyperlinking 2014 MediaEval task and Video Hyperlinking 2015 TRECVid task.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127049141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Session details: Oral Session: Open Software 会话详细信息:口头会话:开放软件
M. Lux
{"title":"Session details: Oral Session: Open Software","authors":"M. Lux","doi":"10.1145/3254619","DOIUrl":"https://doi.org/10.1145/3254619","url":null,"abstract":"","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128691763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3D Facial Video Retrieval and Management for Decision Support in Speech and Language Therapy 三维面部视频检索和管理在言语和语言治疗中的决策支持
Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3078984
Ricardo Carrapiço, I. Guimarães, Margarida Grilo, S. Cavaco, João Magalhães
{"title":"3D Facial Video Retrieval and Management for Decision Support in Speech and Language Therapy","authors":"Ricardo Carrapiço, I. Guimarães, Margarida Grilo, S. Cavaco, João Magalhães","doi":"10.1145/3078971.3078984","DOIUrl":"https://doi.org/10.1145/3078971.3078984","url":null,"abstract":"3D video is introducing great changes in many health related areas. The realism of such information provides health professionals with strong evidence analysis tools to facilitate clinical decision processes. Speech and language therapy aims to help subjects in correcting several disorders. The assessment of the patient by the speech and language therapist (SLT), requires several visual and audio analysis procedures that can interfere with the patient's production of speech. In this context, the main contribution of this paper is a 3D video system to improve health information management processes in speech and language therapy. The 3D video retrieval and management system supports multimodal health records and provides the SLTs with tools to support their work in many ways: (i) it allows SLTs to easily maintain a database of patients' orofacial and speech exercises; (ii) supports three-dimensional orofacial measurement and analysis in a non-intrusive way; and (iii) search patient speech-exercises by similar facial characteristics, using facial image analysis techniques. The second contribution is a dataset with 3D videos of patients performing orofacial speech exercises. The whole system was evaluated successfully in a user study involving 22 SLTs. The user study illustrated the importance of the retrieval by similar orofacial speech exercise.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116744566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Badminton Video Analysis based on Spatiotemporal and Stroke Features 基于时空和击球特征的羽毛球视频分析
Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3079032
W. Chu, S. Situmeang
{"title":"Badminton Video Analysis based on Spatiotemporal and Stroke Features","authors":"W. Chu, S. Situmeang","doi":"10.1145/3078971.3079032","DOIUrl":"https://doi.org/10.1145/3078971.3079032","url":null,"abstract":"Most of the broadcasted sports events nowadays present game statistics to the viewers which can be used to design the gameplay strategy, improve player's performance, or improve accessing the point of interest of a sport game. However, few studies have been proposed for broadcasted badminton videos. In this paper, we integrate several visual analysis techniques to detect the court, detect players, classify strokes, and classify the player's strategy. Based on visual analysis, we can get some insights about the common strategy of a certain player. We evaluate performance of stroke classification, strategy classification, and show game statistics based on classification results.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116797557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Generative Adversarial Networks for Multimodal Representation Learning in Video Hyperlinking 视频超链接中多模态表示学习的生成对抗网络
Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-05-15 DOI: 10.1145/3078971.3079038
V. Vukotic, C. Raymond, G. Gravier
{"title":"Generative Adversarial Networks for Multimodal Representation Learning in Video Hyperlinking","authors":"V. Vukotic, C. Raymond, G. Gravier","doi":"10.1145/3078971.3079038","DOIUrl":"https://doi.org/10.1145/3078971.3079038","url":null,"abstract":"Continuous multimodal representations suitable for multimodal information retrieval are usually obtained with methods that heavily rely on multimodal autoencoders. In video hyperlinking, a task that aims at retrieving video segments, the state of the art is a variation of two interlocked networks working in opposing directions. These systems provide good multimodal embeddings and are also capable of translating from one representation space to the other. Operating on representation spaces, these networks lack the ability to operate in the original spaces (text or image), which makes it difficult to visualize the crossmodal function, and do not generalize well to unseen data. Recently, generative adversarial networks have gained popularity and have been used for generating realistic synthetic data and for obtaining high-level, single-modal latent representation spaces. In this work, we evaluate the feasibility of using GANs to obtain multimodal representations. We show that GANs can be used for multimodal representation learning and that they provide multimodal representations that are superior to representations obtained with multimodal autoencoders. Additionally, we illustrate the ability of visualizing crossmodal translations that can provide human-interpretable insights on learned GAN-based video hyperlinking models.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"GE-23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126565081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Improving Small Object Proposals for Company Logo Detection 公司标志检测小物体改进方案
Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-04-28 DOI: 10.1145/3078971.3078990
C. Eggert, D. Zecha, Stephan Brehm, R. Lienhart
{"title":"Improving Small Object Proposals for Company Logo Detection","authors":"C. Eggert, D. Zecha, Stephan Brehm, R. Lienhart","doi":"10.1145/3078971.3078990","DOIUrl":"https://doi.org/10.1145/3078971.3078990","url":null,"abstract":"Many modern approaches for object detection are two-staged pipelines. The first stage identifies regions of interest which are then classified in the second stage. Faster R-CNN is such an approach for object detection which combines both stages into a single pipeline. In this paper we apply Faster R-CNN to the task of company logo detection. Motivated by its weak performance on small object instances, we examine in detail both the proposal and the classification stage with respect to a wide range of object sizes. We investigate the influence of feature map resolution on the performance of those stages. Based on theoretical considerations, we introduce an improved scheme for generating anchor proposals and propose a modification to Faster R-CNN which leverages higher-resolution feature maps for small objects. We evaluate our approach on the FlickrLogos dataset improving the RPN performance from 0.52 to 0.71 (MABO) and the detection performance from 0.52 to $0.67$ (mAP).","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"121 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113960421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 66
Accelerated Nearest Neighbor Search with Quick ADC 快速ADC加速最近邻搜索
Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-04-24 DOI: 10.1145/3078971.3078992
Fabien André, Anne-Marie Kermarrec, Nicolas Le Scouarnec
{"title":"Accelerated Nearest Neighbor Search with Quick ADC","authors":"Fabien André, Anne-Marie Kermarrec, Nicolas Le Scouarnec","doi":"10.1145/3078971.3078992","DOIUrl":"https://doi.org/10.1145/3078971.3078992","url":null,"abstract":"Efficient Nearest Neighbor (NN) search in high-dimensional spaces is a foundation of many multimedia retrieval systems. Because it offers low responses times, Product Quantization (PQ) is a popular solution. PQ compresses high-dimensional vectors into short codes using several sub-quantizers, which enables in-RAM storage of large databases. This allows fast answers to NN queries, without accessing the SSD or HDD. The key feature of PQ is that it can compute distances between short codes and high-dimensional vectors using cache-resident lookup tables. The efficiency of this technique, named Asymmetric Distance Computation (ADC), remains limited because it performs many cache accesses. In this paper, we introduce Quick ADC, a novel technique that achieves a 3 to 6 times speedup over ADC by exploiting Single Instruction Multiple Data (SIMD) units available in current CPUs. Efficiently exploiting SIMD requires algorithmic changes to the ADC procedure. Namely, Quick ADC relies on two key modifications of ADC: (i) the use 4-bit sub-quantizers instead of the standard 8-bit sub-quantizers and (ii) the quantization of floating-point distances. This allows Quick ADC to exceed the performance of state-of-the-art systems, e.g., it achieves a Recall@100 of 0.94 in 3.4 ms on 1 billion SIFT descriptors (128-bit codes).","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125468753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Panorama to Panorama Matching for Location Recognition 用于位置识别的全景到全景匹配
Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-04-21 DOI: 10.1145/3078971.3079033
Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, T. Furon, Ondřej Chum
{"title":"Panorama to Panorama Matching for Location Recognition","authors":"Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, T. Furon, Ondřej Chum","doi":"10.1145/3078971.3079033","DOIUrl":"https://doi.org/10.1145/3078971.3079033","url":null,"abstract":"Location recognition is commonly treated as visual instance retrieval on \"street view\" imagery. The dataset items and queries are panoramic views, i.e. groups of images taken at a single location. This work introduces a novel panorama-to-panorama matching process, either by aggregating features of individual images in a group or by explicitly constructing a larger panorama. In either case, multiple views are used as queries. We reach near perfect location recognition on a standard benchmark with only four query views.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125992098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
DRAW: Deep Networks for Recognizing Styles of Artists Who Illustrate Children's Books DRAW:识别儿童书籍插图艺术家风格的深度网络
Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-04-10 DOI: 10.1145/3078971.3078982
Samet Hicsonmez, Nermin Samet, Fadime Sener, P. D. Sahin
{"title":"DRAW: Deep Networks for Recognizing Styles of Artists Who Illustrate Children's Books","authors":"Samet Hicsonmez, Nermin Samet, Fadime Sener, P. D. Sahin","doi":"10.1145/3078971.3078982","DOIUrl":"https://doi.org/10.1145/3078971.3078982","url":null,"abstract":"This paper is motivated from a young boy's capability to recognize an illustrator's style in a totally different context. In the book \"We are All Born Free\" [1], composed of selected rights from the Universal Declaration of Human Rights interpreted by different illustrators, the boy was surprised to see a picture similar to the ones in the \"Winnie the Witch\" series drawn by Korky Paul (Figure [1]). The style was noticeable in other characters of the same illustrator in different books as well. The capability of a child to easily spot the style was shown to be valid for other illustrators such as Axel Scheffler and Debi Gliori. The boy's enthusiasm let us to start the journey to explore the capabilities of machines to recognize the style of illustrators. We collected pages from children's books to construct a new illustrations dataset consisting of about 6500 pages from 24 artists. We exploited deep networks for categorizing illustrators and with around 94% classification performance our method over-performed the traditional methods by more than 10%. Going beyond categorization we explored transferring style. The classification performance on the transferred images has shown the ability of our system to capture the style. Furthermore, we discovered representative illustrations and discriminative stylistic elements.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132171319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信