{"title":"Food Photo Enhancer of One Sample Generative Adversarial Network","authors":"Shudan Wang, Liang Sun, Weiming Dong, Yong Zhang","doi":"10.1145/3338533.3366605","DOIUrl":"https://doi.org/10.1145/3338533.3366605","url":null,"abstract":"Image enhancement is an important branch in the field of image processing. A few existing methods leverage Generative Adversarial Networks (GANs) for this task. However, they have several defects when applied to a specific type of images, such as food photo. First, a large set of original-enhanced image pairs are required to train GANs that have millions of parameters. Such image pairs are expensive to acquire. Second, color distribution of enhanced images generated by previous methods is not consistent with the original ones, which is not expected. To alleviate the issues above, we propose a novel method for food photo enhancement. No original-enhanced image pairs are required except only original images. We investigate Food Faithful Color Semantic Rules in Enhanced Dataset Photo Enhancement (Faith-EDPE) and also carefully design a light generator which can preserve semantic relations among colors. We evaluate the proposed method on public benchmark databases to demonstrate the effectiveness of the proposed method through visual results and user studies.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134574624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Feature Interaction Embedding for Pair Matching Prediction","authors":"Luwei Zhang, Xueting Wang, T. Yamasaki","doi":"10.1145/3338533.3366597","DOIUrl":"https://doi.org/10.1145/3338533.3366597","url":null,"abstract":"Online dating services have become popular in modern society. Pair matching prediction between two users in these services can help efficiently increase the possibility of finding their life partners. Deep learning based methods with automatic feature interaction functions such as Factorization Machines (FM) and cross network of Deep & Cross Network (DCN) can model sparse categorical features, which are effective to many recommendation tasks of web applications. To solve the partner recommendation task, we improve these FM-based deep models and DCN by enhancing the representation of feature interaction embedding and proposing a novel design of interaction layer avoiding information loss. Through the experiments on two real-world datasets of two online dating companies, we demonstrate the superior performances of our proposed designs.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127678914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Salient Time Slice Pruning and Boosting for Person-Scene Instance Search in TV Series","authors":"Z. Wang, Fan Yang, S. Satoh","doi":"10.1145/3338533.3366594","DOIUrl":"https://doi.org/10.1145/3338533.3366594","url":null,"abstract":"It is common that TV audiences want to quickly browse scenes with certain actors in TV series. Since 2016, the TREC Video Retrieval Evaluation (TRECVID) Instance Search (INS) task has started to focus on identifying a target person in a target scene simultaneously. In this paper, we name this kind of task as P-S INS (Person-Scene Instance Search). To find out P-S instances, most approaches search person and scene separately, and then directly combine the results together by addition or multiplication. However, we find that person and scene INS modules are not always effective at the same time, or they may suppress each other in some situations. Aggregating the results shot after shot is not a good choice. Luckily, for the TV series, video shots are arranged in chronological order. We extend our focus from time point (single video shot) to time slice (multiple consecutive video shots) in the time-line. Through detecting salient time slices, we prune the data. Through evaluating the importance of salient time slices, we boost the aggregation results. Extensive experiments on the large-scale TRECVID INS dataset demonstrate the effectiveness of the proposed method.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"166 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116111026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lijuan Duan, Huiling Geng, Jun Zeng, Junbiao Pang, Qingming Huang
{"title":"Fast and Accurately Measuring Crack Width via Cascade Principal Component Analysis","authors":"Lijuan Duan, Huiling Geng, Jun Zeng, Junbiao Pang, Qingming Huang","doi":"10.1145/3338533.3366578","DOIUrl":"https://doi.org/10.1145/3338533.3366578","url":null,"abstract":"Crack width is an important indicator to diagnose the safety of constructions, e.g., asphalt road, concrete bridge. In practice, measuring crack width is a challenge task: (1) the irregular and non-smooth boundary makes the traditional method inefficient; (2) pixel-wise measurement guarantees the accuracy of a system and (3) understanding the damage of constructions from any pre-selected points is a mandatary requirement. To address these problems, we propose a cascade Principal Component Analysis (PCA) to efficiently measure crack width from images. Firstly, the binary crack image is obtained to describe the crack via the off-the-shelf crack detection algorithms. Secondly, given a pre-selected point, PCA is used to find the main axis of a crack. Thirdly, Robust Principal Component Analysis (RPCA) is proposed to compute the main axis of a crack with a irregular boundary. We evaluate the proposed method on a real data set. The experimental results show that the proposed method achieves the state-of-the-art performances in terms of efficiency and effectiveness.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116321583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Best Paper Session","authors":"Wen-Huang Cheng","doi":"10.1145/3379189","DOIUrl":"https://doi.org/10.1145/3379189","url":null,"abstract":"","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127687230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Shifted Spatial-Spectral Convolution for Deep Neural Networks","authors":"Yuhao Xu, Hideki Nakayama","doi":"10.1145/3338533.3366575","DOIUrl":"https://doi.org/10.1145/3338533.3366575","url":null,"abstract":"Deep convolutional neural networks (CNNs) extract local features and learn spatial representations via convolutions in the spatial domain. Beyond the spatial information, some works also manage to capture the spectral information in the frequency domain by domain switching methods like discrete Fourier transform (DFT) and discrete cosine transform (DCT). However, most works only pay attention to a single domain, which is prone to ignoring other important features. In this work, we propose a novel network structure to combine spatial and spectral convolutions, and extract features in both spatial and frequency domains. The input channels are divided into two groups for spatial and spectral representations respectively, and then integrated for feature fusion. Meanwhile, we design a channel-shifting mechanism to ensure both spatial and spectral information of every channel are equally and adequately obtained throughout the deep networks. Experimental results demonstrate that compared with state-of-the-art CNN models in a single domain, our shifted spatial-spectral convolution based networks achieve better performance on image classification datasets including CIFAR10, CIFAR100 and SVHN, with considerably fewer parameters.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131335939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiwei Wei, Yang Yang, Jingjing Li, Lei Zhu, Lin Zuo, Heng Tao Shen
{"title":"Residual Graph Convolutional Networks for Zero-Shot Learning","authors":"Jiwei Wei, Yang Yang, Jingjing Li, Lei Zhu, Lin Zuo, Heng Tao Shen","doi":"10.1145/3338533.3366552","DOIUrl":"https://doi.org/10.1145/3338533.3366552","url":null,"abstract":"Most existing Zero-Shot Learning (ZSL) approaches adopt the semantic space as a bridge to classify unseen categories. However, it is difficult to transfer knowledge from seen categories to unseen categories through semantic space, since the correlations among categories are uncertain and ambiguous in the semantic space. In this paper, we formulated zero-shot learning as a classifier weight regression problem. Specifically, we propose a novel Residual Graph Convolution Network (ResGCN) which takes word embeddings and knowledge graph as inputs and outputs a visual classifier for each category. ResGCN can effectively alleviate the problem of over-smoothing and over-fitting. During the test, an unseen image can be classified by ranking the inner product of its visual feature and predictive visual classifiers. Moreover, we provide a new method to build a better knowledge graph. Our approach not only further enhances the correlations among categories, but also makes it easy to add new categories to the knowledge graph. Experiments conducted on the large-scale ImageNet 2011 21K dataset demonstrate that our method significantly outperforms existing state-of-the-art approaches.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130653052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Human Analysis in Multimedia","authors":"Bingkun Bao","doi":"10.1145/3379193","DOIUrl":"https://doi.org/10.1145/3379193","url":null,"abstract":"","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134473366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Boyu Zhang, Xiangguo Ding, Xiaowen Huang, Yang Cao, J. Sang, Jian Yu
{"title":"Multi-source User Attribute Inference based on Hierarchical Auto-encoder","authors":"Boyu Zhang, Xiangguo Ding, Xiaowen Huang, Yang Cao, J. Sang, Jian Yu","doi":"10.1145/3338533.3366599","DOIUrl":"https://doi.org/10.1145/3338533.3366599","url":null,"abstract":"With the rapid development of Online Social Networks (OSNs), it is crucial to construct users' portraits from their dynamic behaviors to address the increasing needs for customized information services. Previous work on user attribute inference mainly concentrated on developing advanced features/models or exploiting external information and knowledge but ignored the contradiction between dynamic behaviors and stable demographic attributes, which results in deviation of user understanding To address the contradiction and accurately infer the user attributes, we propose a Multi-source User Attribute Inference algorithm based on Hierarchical Auto-encoder (MUAI-HAE). The basic idea is that: the shared patterns among the same individual's behaviors on different OSNs well indicate his/her stable demographic attributes. The hierarchical autoencoder is introduced to realize this idea by discovering the underlying non-linear correlation between different OSNs. The unsupervised scheme in shared pattern learning alleviates the requirements for the cross-OSN user account and improves the practicability. Off-the-shelf classification methods are then utilized to infer user attributes from the derived shared behavior patterns. The experiments on the real-world datasets from three OSNs demonstrate the effectiveness of the proposed method.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133059206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Excluding the Misleading Relatedness Between Attributes in Multi-Task Attribute Recognition Network","authors":"Sirui Cai, Yuchun Fang","doi":"10.1145/3338533.3366555","DOIUrl":"https://doi.org/10.1145/3338533.3366555","url":null,"abstract":"In the attribute recognition area, attributes that are unrelated in the real world may have a high co-occurrence rate in a dataset due to the dataset bias, which forms a misleading relatedness. A neural network, especially a multi-task neural network, trained on this dataset would learn this relatedness, and be misled when it is used in practice. In this paper, we propose Share-and-Compete Multi-Task deep learning (SCMTL) model to handle this problem. This model uses adversarial training methods to enhance competition between unrelated attributes while keeping sharing between related attributes, making the task-specific layer of the multi-task model to be more specific and thus rule out the misleading relatedness between the unrelated attributes. Experiments performed on elaborately designed datasets show that the proposed model outperforms the single task neural network and the traditional multi-task neural network in the situation mentioned above.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131321149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}