Proceedings of the 26th ACM international conference on Multimedia最新文献

筛选
英文 中文
Session details: Vision-1 (Machine Learning) 会议详情:Vision-1(机器学习)
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3286920
Jingkuan Song
{"title":"Session details: Vision-1 (Machine Learning)","authors":"Jingkuan Song","doi":"10.1145/3286920","DOIUrl":"https://doi.org/10.1145/3286920","url":null,"abstract":"","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125008768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ThoughtViz
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240641
Praveen Tirupattur, Y. Rawat, C. Spampinato, M. Shah
{"title":"ThoughtViz","authors":"Praveen Tirupattur, Y. Rawat, C. Spampinato, M. Shah","doi":"10.1145/3240508.3240641","DOIUrl":"https://doi.org/10.1145/3240508.3240641","url":null,"abstract":"Studying human brain signals has always gathered great attention from the scientific community. In Brain Computer Interface (BCI) research, for example, changes of brain signals in relation to specific tasks (e.g., thinking something) are detected and used to control machines. While extracting spatio-temporal cues from brain signals for classifying state of human mind is an explored path, decoding and visualizing brain states is new and futuristic. Following this latter direction, in this paper, we propose an approach that is able not only to read the mind, but also to decode and visualize human thoughts. More specifically, we analyze brain activity, recorded by an ElectroEncephaloGram (EEG), of a subject while thinking about a digit, character or an object and synthesize visually the thought item. To accomplish this, we leverage the recent progress of adversarial learning by devising a conditional Generative Adversarial Network (GAN), which takes, as input, encoded EEG signals and generates corresponding images. In addition, since collecting large EEG signals in not trivial, our GAN model allows for learning distributions with limited training data. Performance analysis carried out on three different datasets -- brain signals of multiple subjects thinking digits, characters, and objects -- show that our approach is able to effectively generate images from thoughts of a person. They also demonstrate that EEG signals encode explicitly cues from thoughts which can be effectively used for generating semantically relevant visualizations.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122998208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Cumulative Nets for Edge Detection 边缘检测的累积网
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240688
Jingkuan Song, Zhilong Zhou, Lianli Gao, Xing Xu, Heng Tao Shen
{"title":"Cumulative Nets for Edge Detection","authors":"Jingkuan Song, Zhilong Zhou, Lianli Gao, Xing Xu, Heng Tao Shen","doi":"10.1145/3240508.3240688","DOIUrl":"https://doi.org/10.1145/3240508.3240688","url":null,"abstract":"Lots of recent progress have been made by using Convolutional Neural Networks (CNN) for edge detection. Due to the nature of hierarchical representations learned in CNN, it is intuitive to design side networks utilizing the richer convolutional features to improve the edge detection. However, different side networks are isolated, and the final results are usually weighted sum of the side outputs with uneven qualities. To tackle these issues, we propose a Cumulative Network (C-Net), which learns the side network cumulatively based on current visual features and low-level side outputs, to gradually remove detailed or sharp boundaries to enable high-resolution and accurate edge detection. Therefore, the lower-level edge information is cumulatively inherited while the superfluous details are progressively abandoned. In fact, recursively Learningwhere to remove superfluous details from the current edge map with the supervision of a higher-level visual feature is challenging. Furthermore, we employ atrous convolution (AC) and atrous convolution pyramid pooling (ASPP) to robustly detect object boundaries at multiple scales and aspect ratios. Also, cumulatively refining edges using high-level visual information and lower-lever edge maps is achieved by our designed cumulative residual attention (CRA) block. Experimental results show that our C-Net sets new records for edge detection on both two benchmark datasets: BSDS500 (i.e., .819 ODS, .835 OIS and .862 AP) and NYUDV2 (i.e., .762 ODS, .781 OIS, .797 AP). C-Net has great potential to be applied to other deep learning based applications, e.g., image classification and segmentation.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126306489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Monocular Camera Based Real-Time Dense Mapping Using Generative Adversarial Network 基于生成对抗网络的单目摄像机实时密集映射
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240564
Xin Yang, Jinyu Chen, Zhiwei Wang, Qiaozhe Zhang, Wenyu Liu, Chunyuan Liao, K. Cheng
{"title":"Monocular Camera Based Real-Time Dense Mapping Using Generative Adversarial Network","authors":"Xin Yang, Jinyu Chen, Zhiwei Wang, Qiaozhe Zhang, Wenyu Liu, Chunyuan Liao, K. Cheng","doi":"10.1145/3240508.3240564","DOIUrl":"https://doi.org/10.1145/3240508.3240564","url":null,"abstract":"Monocular simultaneous localization and mapping (SLAM) is a key enabling technique for many computer vision and robotics applications. However, existing methods either can obtain only sparse or semi-dense maps in highly-textured image areas or fail to achieve a satisfactory reconstruction accuracy. In this paper, we present a new method based on a generative adversarial network,named DM-GAN, for real-time dense mapping based on a monocular camera. Specifcally, our depth generator network takes a semidense map obtained from motion stereo matching as a guidance to supervise dense depth prediction of a single RGB image. The depth generator is trained based on a combination of two loss functions, i.e. an adversarial loss for enforcing the generated depth maps to reside on the manifold of the true depth maps and a pixel-wise mean square error (MSE) for ensuring the correct absolute depth values. Extensive experiments on three public datasets demonstrate that our DM-GAN signifcantly outperforms the state-of-the-art methods in terms of greater reconstruction accuracy and higher depth completeness.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126319497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Self-boosted Gesture Interactive System with ST-Net 基于ST-Net的自增强手势交互系统
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240530
Zhengzhe Liu, Xiaojuan Qi, Lei Pang
{"title":"Self-boosted Gesture Interactive System with ST-Net","authors":"Zhengzhe Liu, Xiaojuan Qi, Lei Pang","doi":"10.1145/3240508.3240530","DOIUrl":"https://doi.org/10.1145/3240508.3240530","url":null,"abstract":"In this paper, we propose a self-boosted intelligent system for joint sign language recognition and automatic education. A novel Spatial-Temporal Net (ST-Net) is designed to exploit the temporal dynamics of localized hands for sign language recognition. Features from ST-Net can be deployed by our education system to detect failure modes of the learners. Moreover, the education system can help collect a vast amount of data for training ST-Net. Our sign language recognition and education system help improve each other step-by-step.On the one hand, benefited from accurate recognition system, the education system can detect the failure parts of the learner more precisely. On the other hand, with more training data gathered from the education system, the recognition system becomes more robust and accurate. Experiments on Hong Kong sign language dataset containing 227 commonly used words validate the effectiveness of our joint recognition and education system.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125710511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
An Efficient Deep Quantized Compressed Sensing Coding Framework of Natural Images 一种高效的自然图像深度量化压缩感知编码框架
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240706
Wenxue Cui, F. Jiang, Xinwei Gao, Shengping Zhang, Debin Zhao
{"title":"An Efficient Deep Quantized Compressed Sensing Coding Framework of Natural Images","authors":"Wenxue Cui, F. Jiang, Xinwei Gao, Shengping Zhang, Debin Zhao","doi":"10.1145/3240508.3240706","DOIUrl":"https://doi.org/10.1145/3240508.3240706","url":null,"abstract":"Traditional image compressed sensing (CS) coding frameworks solve an inverse problem that is based on the measurement coding tools (prediction, quantization, entropy coding, etc.) and the optimization based image reconstruction method. These CS coding frameworks face the challenges of improving the coding efficiency at the encoder, while also suffering from high computational complexity at the decoder. In this paper, we move forward a step and propose a novel deep network based CS coding framework of natural images, which consists of three sub-networks: sampling sub-network, offset sub-network and reconstruction sub-network that responsible for sampling, quantization and reconstruction, respectively. By cooperatively utilizing these sub-networks, it can be trained in the form of an end-to-end metric with a proposed rate-distortion optimization loss function. The proposed framework not only improves the coding performance, but also reduces the computational cost of the image reconstruction dramatically. Experimental results on benchmark datasets demonstrate that the proposed method is capable of achieving superior rate-distortion performance against state-of-the-art methods.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"2012 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129686551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A Large-scale RGB-D Database for Arbitrary-view Human Action Recognition 面向任意视图人体动作识别的大规模RGB-D数据库
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240675
Yanli Ji, Feixiang Xu, Yang Yang, Fumin Shen, Heng Tao Shen, Weishi Zheng
{"title":"A Large-scale RGB-D Database for Arbitrary-view Human Action Recognition","authors":"Yanli Ji, Feixiang Xu, Yang Yang, Fumin Shen, Heng Tao Shen, Weishi Zheng","doi":"10.1145/3240508.3240675","DOIUrl":"https://doi.org/10.1145/3240508.3240675","url":null,"abstract":"Current researches mainly focus on single-view and multiview human action recognition, which can hardly satisfy the requirements of human-robot interaction (HRI) applications to recognize actions from arbitrary views. The lack of databases also sets up barriers. In this paper, we newly collect a large-scale RGB-D action database for arbitrary-view action analysis, including RGB videos, depth and skeleton sequences. The database includes action samples captured in 8 fixed viewpoints and varying-view sequences which covers the entire 360 view angles. In total, 118 persons are invited to act 40 action categories, and 25,600 video samples are collected. Our database involves more articipants, more viewpoints and a large number of samples. More importantly, it is the first database containing the entire 360? varying-view sequences. The database provides sufficient data for cross-view and arbitrary-view action analysis. Besides, we propose a View-guided Skeleton CNN (VS-CNN) to tackle the problem of arbitrary-view action recognition. Experiment results show that the VS-CNN achieves superior performance.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125080754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Unprecedented Usage of Pre-trained CNNs on Beauty Product 在美容产品上史无前例地使用预先训练的cnn
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3266433
Jian Han Lim, Nurul Japar, Chun Chet Ng, Chee Seng Chan
{"title":"Unprecedented Usage of Pre-trained CNNs on Beauty Product","authors":"Jian Han Lim, Nurul Japar, Chun Chet Ng, Chee Seng Chan","doi":"10.1145/3240508.3266433","DOIUrl":"https://doi.org/10.1145/3240508.3266433","url":null,"abstract":"How does a pre-trained Convolution Neural Network (CNN) model perform on beauty and personal care items (i.e Perfect-500K) This is the question we attempt to answer in this paper by adopting several well known deep learning models pre-trained on ImageNet, and evaluate their performance using different distance metrics. In the Perfect Corp Challenge, we manage to secure fourth position by using only the pre-trained model.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130579867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Robustness and Discrimination Oriented Hashing Combining Texture and Invariant Vector Distance 结合纹理和不变向量距离的鲁棒性和判别哈希
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240690
Ziqing Huang, Shiguang Liu
{"title":"Robustness and Discrimination Oriented Hashing Combining Texture and Invariant Vector Distance","authors":"Ziqing Huang, Shiguang Liu","doi":"10.1145/3240508.3240690","DOIUrl":"https://doi.org/10.1145/3240508.3240690","url":null,"abstract":"Image hashing is a novel technology of multimedia processing with wide applications. Robustness and discrimination are two of the most important objectives of image hashing. Different from existing hashing methods without a good balance with respect to robustness and discrimination, which largely restrict the application in image retrieval and copy detection, i.e., seriously reducing the retrieval accuracy of similar images, we propose a new hashing method which can preserve two kinds of complementary features (global feature via texture and local feature via DCT coefficients) to achieve a good balance between robustness and discrimination. Specifically, the statistical characteristics in gray-level co-occurrence matrix (GLCM) are extracted to well reveal the texture changes of an image, which is of great benefit to improve the perceptual robustness. Then, the normalized image is divided into image blocks, and the dominant DCT coefficients in the first row/column are selected to form a feature matrix. The Euclidean distance between vectors of the feature matrix is invariant to commonly-used digital operations, which helps make hash more compact. Various experiments show that our approach achieves a better balance between robustness and discrimination than the state-of-the-art algorithms.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131290296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Direction-aware Neural Style Transfer 方向感知神经风格迁移
Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240629
Hao Wu, Zhengxing Sun, Weihang Yuan
{"title":"Direction-aware Neural Style Transfer","authors":"Hao Wu, Zhengxing Sun, Weihang Yuan","doi":"10.1145/3240508.3240629","DOIUrl":"https://doi.org/10.1145/3240508.3240629","url":null,"abstract":"Neural learning methods have been shown to be effective in style transfer. These methods, which are called NST, aim to synthesize a new image that retains the high-level structure of a content image while keeps the low-level features of a style image. However, these models using convolutional structures only extract local statistical features of style images and semantic features of content images. Since the absence of low-level features in the content image, these methods would synthesize images that look unnatural and full of traces of machines. In this paper, we find that direction, that is, the orientation of each painting stroke, can capture the soul of image style preferably and thus generates much more natural and vivid stylizations. According to this observation, we propose a Direction-aware Neural Style Transfer (DaNST) with two major innovations. First, a novel direction field loss is proposed to steer the direction of strokes in the synthesized image. And to build this loss function, we propose novel direction field loss networks to generate and compare the direction fields of content image and synthesized image. By incorporating the direction field loss in neural style transfer, we obtain a new optimization objective. Through minimizing this objective, we can produce synthesized images that better follow the direction field of the content image. Second, our method provides a simple interaction mechanism to control the generated direction fields, and further control the texture direction in synthesized images. Experiments show that our method outperforms state-of-the-art in most styles such as oil painting and mosaic.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116476001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信