Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval最新文献

筛选
英文 中文
Session details: Oral Session 3: Multimedia Applications 会议详情:口头会议3:多媒体应用
Wolfgang Hürst
{"title":"Session details: Oral Session 3: Multimedia Applications","authors":"Wolfgang Hürst","doi":"10.1145/3252928","DOIUrl":"https://doi.org/10.1145/3252928","url":null,"abstract":"","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134322743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimally Grouped Deep Features Using Normalized Cost for Video Scene Detection 基于归一化代价的深度特征最佳分组视频场景检测
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206055
Daniel Rotman, Dror Porat, G. Ashour, Udi Barzelay
{"title":"Optimally Grouped Deep Features Using Normalized Cost for Video Scene Detection","authors":"Daniel Rotman, Dror Porat, G. Ashour, Udi Barzelay","doi":"10.1145/3206025.3206055","DOIUrl":"https://doi.org/10.1145/3206025.3206055","url":null,"abstract":"Video scene detection is the task of temporally dividing a video into its semantic sections. This is an important preliminary step for effective analysis of heterogeneous video content. We present a unique formulation of this task as a generic optimization problem with a novel normalized cost function, aimed at optimal grouping of consecutive shots into scenes. The mathematical properties of the proposed normalized cost function enable robust scene detection, also in challenging real-world scenarios. We present a novel dynamic programming formulation for efficiently optimizing the proposed cost function despite an inherent dependency between subproblems. We use deep neural network models for visual and audio analysis to encode the semantic elements in the video scene, enabling effective and more accurate video scene detection. The proposed method has two key advantages compared to other approaches: it inherently provides a temporally consistent division of the video into scenes, and is also parameter-free, eliminating the need for fine-tuning for different types of content. While our method can adaptively estimate the number of scenes from the video content, we also present a new non-greedy procedure for creating a hierarchical consensus-based division tree spanning multiple levels of granularity. We provide comprehensive experimental results showing the benefits of the normalized cost function, and demonstrating that the proposed method outperforms the current state of the art in video scene detection.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131411070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Session details: Special Session 1: Predicting User Perceptions of Multimedia Content 专题会议1:预测用户对多媒体内容的感知
C. Demarty
{"title":"Session details: Special Session 1: Predicting User Perceptions of Multimedia Content","authors":"C. Demarty","doi":"10.1145/3252931","DOIUrl":"https://doi.org/10.1145/3252931","url":null,"abstract":"","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"63 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123336556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Multi-Oriented Scene Text Detector with Position-Sensitive Segmentation 基于位置敏感分割的多方向场景文本检测器
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206043
Peirui Cheng, Weiqiang Wang
{"title":"A Multi-Oriented Scene Text Detector with Position-Sensitive Segmentation","authors":"Peirui Cheng, Weiqiang Wang","doi":"10.1145/3206025.3206043","DOIUrl":"https://doi.org/10.1145/3206025.3206043","url":null,"abstract":"Scene text detection has been studied for a long time and lots of approaches have achieved promising performances. Most approaches regard text as a specific object and utilize the popular frameworks of object detection to detect scene text. However, scene text is different from general objects in terms of orientations, sizes and aspect ratios. In this paper, we present an end-to-end multi-oriented scene text detection approach, which combines the object detection framework with the position-sensitive segmentation. For a given image, features are extracted through a fully convolutional network. Then they are input into text detection branch and position-sensitive segmentation branch simultaneously, where text detection branch is used for generating candidates and position-sensitive segmentation branch is used for generating segmentation maps. Finally the candidates generated by text detection branch are projected onto the position-sensitive segmentation maps for filtering. The proposed approach utilizes the merits of position-sensitive segmentation to improve the expressiveness of the proposed network. Additionally, the approach uses position-sensitive segmentation maps to further filter the candidates so as to highly improve the precision rate. Experiments on datasets ICDAR2015 and COCO-Text demonstrate that the proposed method outperforms previous state-of-the-art methods. For ICDAR2015 dataset, the proposed method achieves an F-score of 0.83 and a precision rate of 0.87.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116183745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
WTPlant (What's That Plant?): A Deep Learning System for Identifying Plants in Natural Images WTPlant(那是什么植物?):一个用于识别自然图像中的植物的深度学习系统
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206089
Jonas Krause, Gavin Sugita, K. Baek, Lipyeow Lim
{"title":"WTPlant (What's That Plant?): A Deep Learning System for Identifying Plants in Natural Images","authors":"Jonas Krause, Gavin Sugita, K. Baek, Lipyeow Lim","doi":"10.1145/3206025.3206089","DOIUrl":"https://doi.org/10.1145/3206025.3206089","url":null,"abstract":"Despite the availability of dozens of plant identification mobile applications, identifying plants from a natural image remains a challenging problem - most of the existing applications do not address the complexity of natural images, the large number of plant species, and the multi-scale nature of natural images. In this technical demonstration, we present the WTPlant system for identifying plants in natural images. WTPlant is based on deep learning approaches. Specifically, it uses stacked Convolutional Neural Networks for image segmentation, a novel preprocessing stage for multi-scale analyses, and deep convolutional networks to extract the most discriminative features. WTPlant employs different classification architectures for plants and flowers, thus enabling plant identification throughout all the seasons. The user interface also shows, in an interactive way, the most representative areas in the image that are used to predict each plant species. The first version of WTPlant is trained to classify 100 different plant species present in the campus of the University of Hawai'i at Manoa. First experiments support the hypothesis that an initial segmentation process helps guide the extraction of representative samples and, consequently, enables Convolutional Neural Networks to better recognize objects of different scales in natural images. Future versions aim to extend the recognizable species to cover the land-based flora of the Hawaiian Islands.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115526618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Session details: Oral Session 2: Multimedia Content Analysis 会议详情:口头会议2:多媒体内容分析
W. Chu
{"title":"Session details: Oral Session 2: Multimedia Content Analysis","authors":"W. Chu","doi":"10.1145/3252927","DOIUrl":"https://doi.org/10.1145/3252927","url":null,"abstract":"","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132346433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Class-aware Self-Attention for Audio Event Recognition 音频事件识别的类感知自关注
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206067
Shizhe Chen, Jia Chen, Qin Jin, Alexander Hauptmann
{"title":"Class-aware Self-Attention for Audio Event Recognition","authors":"Shizhe Chen, Jia Chen, Qin Jin, Alexander Hauptmann","doi":"10.1145/3206025.3206067","DOIUrl":"https://doi.org/10.1145/3206025.3206067","url":null,"abstract":"Audio event recognition (AER) has been an important research problem with a wide range of applications. However, it is very challenging to develop large scale audio event recognition models. On the one hand, usually there are only \"weak\" labeled audio training data available, which only contains labels of audio events without temporal boundaries. On the other hand, the distribution of audio events is generally long-tailed, with only a few positive samples for large amounts of audio events. These two issues make it hard to learn discriminative acoustic features to recognize audio events especially for long-tailed events. In this paper, we propose a novel class-aware self-attention mechanism with attention factor sharing to generate discriminative clip-level features for audio event recognition. Since a target audio event only occurs in part of an entire audio clip and its corresponding temporal interval varies, the proposed class-aware self-attention approach learns to highlight relevant temporal intervals and to suppress irrelevant noises at the same time. In order to learn attention patterns effectively for those long-tailed events, we combine both the domain knowledge and data driven strategies to share attention factors in the proposed attention mechanism, which transfers the common knowledge learned from other similar events to the rare events. The proposed attention mechanism is a pluggable component and can be trained end-to-end in the overall AER model. We evaluate our model on a large-scale audio event corpus \"Audio Set\" with both short-term and long-term acoustic features. The experimental results demonstrate the effectiveness of our model, which improves the overall audio event recognition performance with different acoustic features especially for events with low resources. Moreover, the experiments also show that our proposed model is able to learn new audio events with a few training examples effectively and efficiently without disturbing the previously learned audio events.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132701386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Compare Stereo Patches Using Atrous Convolutional Neural Networks 比较使用卷积神经网络的立体图像
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3206075
Zhiwei Li, Lei Yu
{"title":"Compare Stereo Patches Using Atrous Convolutional Neural Networks","authors":"Zhiwei Li, Lei Yu","doi":"10.1145/3206025.3206075","DOIUrl":"https://doi.org/10.1145/3206025.3206075","url":null,"abstract":"In this work, we address the task of dense stereo matching with Convolutional Neural Networks (CNNs). Particularly, we focus on improving matching cost computation by better aggregating contextual information. Towards this goal, we advocate to use atrous convolution, a powerful tool for dense prediction task that allows us to control the resolution at which feature responses are computed within CNNs and to enlarge the receptive field of the network without losing image resolution and requiring learning extra parameters. Aiming to improve the performance of atrous convolution, we propose different frameworks for further boosting performance. We evaluate our models on KITTI 2015 benchmark, the result shows that we achieve on-par performance with fewer post-processing methods applied.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"174 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124255227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Session details: Industrial Talks 会议详情:工业会谈
Go Irie Tao Mei
{"title":"Session details: Industrial Talks","authors":"Go Irie Tao Mei","doi":"10.1145/3252924","DOIUrl":"https://doi.org/10.1145/3252924","url":null,"abstract":"","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116016255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimedia Content Understanding by Learning from Very Few Examples: Recent Progress on Unsupervised, Semi-Supervised and Supervised Deep Learning Approaches 从很少的例子中学习来理解多媒体内容:无监督、半监督和监督深度学习方法的最新进展
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI: 10.1145/3206025.3210498
Guo-Jun Qi
{"title":"Multimedia Content Understanding by Learning from Very Few Examples: Recent Progress on Unsupervised, Semi-Supervised and Supervised Deep Learning Approaches","authors":"Guo-Jun Qi","doi":"10.1145/3206025.3210498","DOIUrl":"https://doi.org/10.1145/3206025.3210498","url":null,"abstract":"In this tutorial, the speaker will present serval parallel efforts on building deep learning models with very few supervision information, with or without unsupervised data available. In particular, we will discuss in details. (1) Generative Adverbial Nets (GANs) and their applications to unsupervised feature extractions, semi-supervised learning with few labeled examples and a large amount of unlabeled data. We will discuss the state-of-the-art results that have been achieved by the semi-supervised GANs. (2) Low-Shot Learning algorithms to train and test models on disjoint sets of tasks. We will discuss the ideas of how to efficiently adapt models to tasks with very few examples. In particular, we will discuss several paradigms of learning-to-learn approaches. (3) We will also discuss how to transfer models across modalities by leveraging abundant labels from one modality to train a model for other modalities with few labels. We will discuss in details the cross-modal label transfer approach.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132444203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信