Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval最新文献_第9页

Semi-Automatic Retrieval of Relevant Segments from Laparoscopic Surgery Videos 腹腔镜手术视频中相关片段的半自动检索

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3079008

Stefan Petscharnig

{"title":"Semi-Automatic Retrieval of Relevant Segments from Laparoscopic Surgery Videos","authors":"Stefan Petscharnig","doi":"10.1145/3078971.3079008","DOIUrl":"https://doi.org/10.1145/3078971.3079008","url":null,"abstract":"Over the last decades, progress in medical technology and imaging technology enabled the technique of minimally invasive surgery. In addition, multimedia technologies allow for retrospective analyses of surgeries. The accumulated videos and images allow for a speed-up in documentation, easier medical case assessment across surgeons, training young surgeons, as well as they find the usage in medical research. Considering a surgery lasting for hours of routine work, surgeons only need to see short video segments of interest to assess a case. Surgeons do not have the time to manually extract video sequences of their surgeries from their big multimedia databases as they do not have the resources for this time-consuming task. The thesis deals with the questions of how to semantically classify video frames using Convolutional Neural Networks into different semantic concepts of surgical actions and anatomical structures. In order to achieve this goal, the capabilities of predefined CNN architectures and transfer learning in the laparoscopic video domain are investigated. The results are expected to improve by domain-specific adaptation of the CNN input layers, i.e. by fusion of the image with motion and relevance information. Finally, the thesis investigates to what extent surgeons' needs are covered with the proposed extraction of relevant scenes.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"87 16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126299072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Visual Descriptors in Methods for Video Hyperlinking 视频超链接方法中的视觉描述符

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3079026

P. Galuscáková, Michal Batko, Jan Cech, Jiri Matas, David Novak, Pavel Pecina

引用次数: 2

Session details: Oral Session: Open Software 会话详细信息:口头会话:开放软件

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3254619

M. Lux

引用次数: 0

3D Facial Video Retrieval and Management for Decision Support in Speech and Language Therapy 三维面部视频检索和管理在言语和语言治疗中的决策支持

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3078984

Ricardo Carrapiço, I. Guimarães, Margarida Grilo, S. Cavaco, João Magalhães

{"title":"3D Facial Video Retrieval and Management for Decision Support in Speech and Language Therapy","authors":"Ricardo Carrapiço, I. Guimarães, Margarida Grilo, S. Cavaco, João Magalhães","doi":"10.1145/3078971.3078984","DOIUrl":"https://doi.org/10.1145/3078971.3078984","url":null,"abstract":"3D video is introducing great changes in many health related areas. The realism of such information provides health professionals with strong evidence analysis tools to facilitate clinical decision processes. Speech and language therapy aims to help subjects in correcting several disorders. The assessment of the patient by the speech and language therapist (SLT), requires several visual and audio analysis procedures that can interfere with the patient's production of speech. In this context, the main contribution of this paper is a 3D video system to improve health information management processes in speech and language therapy. The 3D video retrieval and management system supports multimodal health records and provides the SLTs with tools to support their work in many ways: (i) it allows SLTs to easily maintain a database of patients' orofacial and speech exercises; (ii) supports three-dimensional orofacial measurement and analysis in a non-intrusive way; and (iii) search patient speech-exercises by similar facial characteristics, using facial image analysis techniques. The second contribution is a dataset with 3D videos of patients performing orofacial speech exercises. The whole system was evaluated successfully in a user study involving 22 SLTs. The user study illustrated the importance of the retrieval by similar orofacial speech exercise.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116744566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Badminton Video Analysis based on Spatiotemporal and Stroke Features 基于时空和击球特征的羽毛球视频分析

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI: 10.1145/3078971.3079032

W. Chu, S. Situmeang

引用次数: 35

Generative Adversarial Networks for Multimodal Representation Learning in Video Hyperlinking 视频超链接中多模态表示学习的生成对抗网络

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-05-15 DOI: 10.1145/3078971.3079038

V. Vukotic, C. Raymond, G. Gravier

{"title":"Generative Adversarial Networks for Multimodal Representation Learning in Video Hyperlinking","authors":"V. Vukotic, C. Raymond, G. Gravier","doi":"10.1145/3078971.3079038","DOIUrl":"https://doi.org/10.1145/3078971.3079038","url":null,"abstract":"Continuous multimodal representations suitable for multimodal information retrieval are usually obtained with methods that heavily rely on multimodal autoencoders. In video hyperlinking, a task that aims at retrieving video segments, the state of the art is a variation of two interlocked networks working in opposing directions. These systems provide good multimodal embeddings and are also capable of translating from one representation space to the other. Operating on representation spaces, these networks lack the ability to operate in the original spaces (text or image), which makes it difficult to visualize the crossmodal function, and do not generalize well to unseen data. Recently, generative adversarial networks have gained popularity and have been used for generating realistic synthetic data and for obtaining high-level, single-modal latent representation spaces. In this work, we evaluate the feasibility of using GANs to obtain multimodal representations. We show that GANs can be used for multimodal representation learning and that they provide multimodal representations that are superior to representations obtained with multimodal autoencoders. Additionally, we illustrate the ability of visualizing crossmodal translations that can provide human-interpretable insights on learned GAN-based video hyperlinking models.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"GE-23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126565081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Improving Small Object Proposals for Company Logo Detection 公司标志检测小物体改进方案

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-04-28 DOI: 10.1145/3078971.3078990

C. Eggert, D. Zecha, Stephan Brehm, R. Lienhart

引用次数: 66

Accelerated Nearest Neighbor Search with Quick ADC 快速ADC加速最近邻搜索

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-04-24 DOI: 10.1145/3078971.3078992

Fabien André, Anne-Marie Kermarrec, Nicolas Le Scouarnec

{"title":"Accelerated Nearest Neighbor Search with Quick ADC","authors":"Fabien André, Anne-Marie Kermarrec, Nicolas Le Scouarnec","doi":"10.1145/3078971.3078992","DOIUrl":"https://doi.org/10.1145/3078971.3078992","url":null,"abstract":"Efficient Nearest Neighbor (NN) search in high-dimensional spaces is a foundation of many multimedia retrieval systems. Because it offers low responses times, Product Quantization (PQ) is a popular solution. PQ compresses high-dimensional vectors into short codes using several sub-quantizers, which enables in-RAM storage of large databases. This allows fast answers to NN queries, without accessing the SSD or HDD. The key feature of PQ is that it can compute distances between short codes and high-dimensional vectors using cache-resident lookup tables. The efficiency of this technique, named Asymmetric Distance Computation (ADC), remains limited because it performs many cache accesses. In this paper, we introduce Quick ADC, a novel technique that achieves a 3 to 6 times speedup over ADC by exploiting Single Instruction Multiple Data (SIMD) units available in current CPUs. Efficiently exploiting SIMD requires algorithmic changes to the ADC procedure. Namely, Quick ADC relies on two key modifications of ADC: (i) the use 4-bit sub-quantizers instead of the standard 8-bit sub-quantizers and (ii) the quantization of floating-point distances. This allows Quick ADC to exceed the performance of state-of-the-art systems, e.g., it achieves a Recall@100 of 0.94 in 3.4 ms on 1 billion SIFT descriptors (128-bit codes).","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125468753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Panorama to Panorama Matching for Location Recognition 用于位置识别的全景到全景匹配

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-04-21 DOI: 10.1145/3078971.3079033

Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, T. Furon, Ondřej Chum

引用次数: 25

DRAW: Deep Networks for Recognizing Styles of Artists Who Illustrate Children's Books DRAW:识别儿童书籍插图艺术家风格的深度网络

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-04-10 DOI: 10.1145/3078971.3078982

Samet Hicsonmez, Nermin Samet, Fadime Sener, P. D. Sahin

{"title":"DRAW: Deep Networks for Recognizing Styles of Artists Who Illustrate Children's Books","authors":"Samet Hicsonmez, Nermin Samet, Fadime Sener, P. D. Sahin","doi":"10.1145/3078971.3078982","DOIUrl":"https://doi.org/10.1145/3078971.3078982","url":null,"abstract":"This paper is motivated from a young boy's capability to recognize an illustrator's style in a totally different context. In the book \"We are All Born Free\" [1], composed of selected rights from the Universal Declaration of Human Rights interpreted by different illustrators, the boy was surprised to see a picture similar to the ones in the \"Winnie the Witch\" series drawn by Korky Paul (Figure [1]). The style was noticeable in other characters of the same illustrator in different books as well. The capability of a child to easily spot the style was shown to be valid for other illustrators such as Axel Scheffler and Debi Gliori. The boy's enthusiasm let us to start the journey to explore the capabilities of machines to recognize the style of illustrators. We collected pages from children's books to construct a new illustrations dataset consisting of about 6500 pages from 24 artists. We exploited deep networks for categorizing illustrators and with around 94% classification performance our method over-performed the traditional methods by more than 10%. Going beyond categorization we explored transferring style. The classification performance on the transferred images has shown the ability of our system to capture the style. Furthermore, we discovered representative illustrations and discriminative stylistic elements.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132171319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13