ACM Multimedia Asia最新文献

筛选
英文 中文
Structural Knowledge Organization and Transfer for Class-Incremental Learning 渐进式学习的结构知识组织与迁移
ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490598
Yu Liu, Xiaopeng Hong, Xiaoyu Tao, Songlin Dong, Jingang Shi, Yihong Gong
{"title":"Structural Knowledge Organization and Transfer for Class-Incremental Learning","authors":"Yu Liu, Xiaopeng Hong, Xiaoyu Tao, Songlin Dong, Jingang Shi, Yihong Gong","doi":"10.1145/3469877.3490598","DOIUrl":"https://doi.org/10.1145/3469877.3490598","url":null,"abstract":"Deep models are vulnerable to catastrophic forgetting when fine-tuned on new data. Popular distillation-based methods usually neglect the relations between data samples and may eventually forget essential structural knowledge. To solve these shortcomings, we propose a structural graph knowledge distillation based incremental learning framework to preserve both the positions of samples and their relations. Firstly, a memory knowledge graph (MKG) is generated to fully characterize the structural knowledge of historical tasks. Secondly, we develop a graph interpolation mechanism to enrich the domain of knowledge and alleviate the inter-class sample imbalance issue. Thirdly, we introduce structural graph knowledge distillation to transfer the knowledge of historical tasks. Comprehensive experiments on three datasets validate the proposed method.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114107435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
MIRecipe: A Recipe Dataset for Stage-Aware Recognition of Changes in Appearance of Ingredients MIRecipe:用于配料外观变化阶段感知识别的配方数据集
ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490596
Yixin Zhang, Yoko Yamakata, Keishi Tajima
{"title":"MIRecipe: A Recipe Dataset for Stage-Aware Recognition of Changes in Appearance of Ingredients","authors":"Yixin Zhang, Yoko Yamakata, Keishi Tajima","doi":"10.1145/3469877.3490596","DOIUrl":"https://doi.org/10.1145/3469877.3490596","url":null,"abstract":"In this paper, we introduce a new recipe dataset MIRecipe (Multimedia-Instructional Recipe). It has both text and image data for every cooking step, while the conventional recipe datasets only contain final dish images, and/or images only for some of the steps. It consists of 26,725 recipes, which include 239,973 steps in total. The recognition of ingredients in images associated with cooking steps poses a new challenge: Since ingredients are processed during cooking, the appearance of the same ingredient is very different in the beginning and finishing stages of the cooking. The general object recognition methods, which assume the constant appearance of objects, do not perform well for such objects. To solve the problem, we propose two stage-aware techniques: stage-wise model learning, which trains a separate model for each stage, and stage-aware curriculum learning, which starts with the training data from the beginning stage and proceeds to the later stages. Our experiment with our dataset shows that our method achieves higher accuracy than the model trained using all the data without considering the stages. Our dataset is available at our GitHub repository.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121379082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Convolutional Neural Network-Based Pure Paint Pigment Identification Using Hyperspectral Images 基于卷积神经网络的高光谱图像纯颜料识别
ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3495641
Ailin Chen, R. Jesus, M. Vilarigues
{"title":"Convolutional Neural Network-Based Pure Paint Pigment Identification Using Hyperspectral Images","authors":"Ailin Chen, R. Jesus, M. Vilarigues","doi":"10.1145/3469877.3495641","DOIUrl":"https://doi.org/10.1145/3469877.3495641","url":null,"abstract":"This research presents the results of the implementation of deep learning neural networks in the identification of pure pigments of heritage artwork, namely paintings. Our paper applies an innovative three-branch deep learning model to maximise the correct identification of pure pigments. The model proposed combines the feature maps obtained from hyperspectral images through multiple convolutional neural networks, and numerical, hyperspectral metric data with respect to a set of reference reflectances. The results obtained exhibit an accurate representation of the pure predicted pigments which are confirmed through the use of analytical techniques. The model presented outperformed the compared counterparts and is deemed to be an important direction, not only in terms of utilisation of hyperspectral data and concrete pigment data in heritage analysis, but also in the application of deep learning in other fields.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128015404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Entity Relation Fusion for Real-Time One-Stage Referring Expression Comprehension 面向实时单阶段引用表达式理解的实体关系融合
ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490592
Hang Yu, Weixin Li, Jiankai Li, Ye Du
{"title":"Entity Relation Fusion for Real-Time One-Stage Referring Expression Comprehension","authors":"Hang Yu, Weixin Li, Jiankai Li, Ye Du","doi":"10.1145/3469877.3490592","DOIUrl":"https://doi.org/10.1145/3469877.3490592","url":null,"abstract":"Referring Expression Comprehension (REC) is the task of grounding object which is referred by the language expression. Previous one-stage REC methods usually use one single language feature vector to represent the whole query for grounding and no reasoning between different objects is performed despite the rich relation cues of objects contained in the language expression, which depresses their grounding accuracy. Additionally, these methods mostly use the feature pyramid networks for multi-scale visual object feature extraction but ground on different feature layers separately, neglecting the connections between objects with different scales. To address these problems, we propose a novel one-stage REC method, i.e. the Entity Relation Fusion Network (ERFN) to locate referred object by relation guided reasoning on different objects. In ERFN, instead of grounding objects at each layer separately, we propose a Language Guided Multi-Scale Fusion (LGMSF) model to utilize language to guide the fusion of representations of objects with different scales into one feature map.For modeling connections between different objects, we design a Relation Guided Feature Fusion (RGFF) model that extracts entities in the language expression to enhance the referred entity feature in the visual object feature map, and further extracts relations to guide object feature fusion based on the self-attention mechanism. Experimental results show that our method is competitive with the state-of-the-art one-stage and two-stage REC methods, and can also keep inferring in real time.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133439636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Flat and Shallow: Understanding Fake Image Detection Models by Architecture Profiling 平面和浅:通过架构分析理解假图像检测模型
ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490566
Jing-Fen Xu, Wei Zhang, Yalong Bai, Qibin Sun, Tao Mei
{"title":"Flat and Shallow: Understanding Fake Image Detection Models by Architecture Profiling","authors":"Jing-Fen Xu, Wei Zhang, Yalong Bai, Qibin Sun, Tao Mei","doi":"10.1145/3469877.3490566","DOIUrl":"https://doi.org/10.1145/3469877.3490566","url":null,"abstract":"Digital image manipulations have been heavily abused to spread misinformation. Despite the great efforts dedicated in research community, prior works are mostly performance-driven, i.e., optimizing performances using standard/heavy networks designed for semantic classification. A thorough understanding for fake images detection models is still missing. This paper studies the essential ingredients for a good fake image detection model, by profiling the best-performing architectures. Specifically, we conduct a thorough analysis on a massive number of detection models, and observe how the performances are affected by different patterns of network structure. Our key findings include: 1) with the same computational budget, flat network structures (e.g., large kernel sizes, wide connections) perform better than commonly used deep networks; 2) operations in shallow layers deserve more computational capacities to trade-off performance and computational cost. These findings sketch a general profile for essential models of fake image detection, which show clear differences with those for semantic classification. Furthermore, based on our analysis, we propose a new Depth-Separable Search Space (DSS) for fake image detection. Compared to state-of-the-art methods, our model achieves competitive performance while saving more than 50% parameters.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134628278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Hyperspectral Super-Resolution via Heterogeneous Knowledge Distillation 利用异构知识蒸馏提高高光谱超分辨率
ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490610
Ziqian Liu, Qing Ma, Junjun Jiang, Xianming Liu
{"title":"Improving Hyperspectral Super-Resolution via Heterogeneous Knowledge Distillation","authors":"Ziqian Liu, Qing Ma, Junjun Jiang, Xianming Liu","doi":"10.1145/3469877.3490610","DOIUrl":"https://doi.org/10.1145/3469877.3490610","url":null,"abstract":"Hyperspectral images (HSI) contains rich spectrum information but their spatial resolution is often limited by imaging system. Super-resolution (SR) reconstruction becomes a hot topic aiming to increase spatial resolution without extra hardware cost. The fusion-based hyperspectral image super-resolution (FHSR) methods use supplementary high-resolution multispectral images (HR-MSI) to recover spatial details, but well co-registered HR-MSI is hard to collect. Recently, single hyperspectral image super-resolution (SHSR) methods based on deep learning have made great progress. However, lack of HR-MSI input makes these SHSR methods difficult to exploit the spatial information. To take advantages of FHSR and SHSR methods, in this paper we propose a new pipeline treating HR-MSI as privilege information and try to improve our SHSR model with knowledge distillation. That is, our model uses paired MSI-HSI data to train and only needs LR-HSI as input during inference. Specifically, we combine SHSR and spectral super-resolution (SSR) and design a novel architecture, Distillation-Oriented Dual-branch Net (DODN), to make the SHSR model fully employ transferred knowledge from the SSR model. Since the main stream of SSR model are 2D CNNs and full 2D CNN causes spectral disorder in SHSR task, a new mixed 2D/3D block, called Distillation-Oriented Dual-branch Block (DODB) is proposed, where the 3D branch extracts spectral-spatial correlation while the 2D branch accepts information from the SSR model through knowledge distillation. The main idea is to distill the knowledge of spatial information from HR-MSI to the SHSR model without changing its network architecture. Extensive experiments on two benchmark datasets, CAVE and NTIRE2020, demonstrate that our proposed DODN outperforms the state-of-the-art SHSR methods, in terms of both quantitative and qualitative analysis.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132535736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Motion = Video - Content: Towards Unsupervised Learning of Motion Representation from Videos 运动=视频-内容:从视频中实现运动表示的无监督学习
ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490582
Hehe Fan, Mohan S. Kankanhalli
{"title":"Motion = Video - Content: Towards Unsupervised Learning of Motion Representation from Videos","authors":"Hehe Fan, Mohan S. Kankanhalli","doi":"10.1145/3469877.3490582","DOIUrl":"https://doi.org/10.1145/3469877.3490582","url":null,"abstract":"Motion, according to its definition in physics, is the change in position with respect to time, regardless of the specific moving object and background. In this paper, we aim to learn appearance-independent motion representation in an unsupervised manner. The main idea is to separate motion from videos while leaving objects and background as content. Specifically, we design an encoder-decoder model which consists of a content encoder, a motion encoder and a video generator. To train the model, we leverage a one-step cycle-consistency in reconstruction within the same video and a two-step cycle-consistency in generation across different videos as self-supervised signals, and use adversarial training to remove the content representation from the motion representation. We demonstrate that the proposed framework can be used for conditional video generation and fine-grained action recognition.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133700966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
PLM-IPE: A Pixel-Landmark Mutual Enhanced Framework for Implicit Preference Estimation PLM-IPE:一个用于隐式偏好估计的像素里程碑式相互增强框架
ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490621
Federico Becattini, Xuemeng Song, C. Baecchi, S. Fang, C. Ferrari, Liqiang Nie, A. del Bimbo
{"title":"PLM-IPE: A Pixel-Landmark Mutual Enhanced Framework for Implicit Preference Estimation","authors":"Federico Becattini, Xuemeng Song, C. Baecchi, S. Fang, C. Ferrari, Liqiang Nie, A. del Bimbo","doi":"10.1145/3469877.3490621","DOIUrl":"https://doi.org/10.1145/3469877.3490621","url":null,"abstract":"In this paper, we are interested in understanding how customers perceive fashion recommendations, in particular when observing a proposed combination of garments to compose an outfit. Automatically understanding how a suggested item is perceived, without any kind of active engagement, is in fact an essential block to achieve interactive applications. We propose a pixel-landmark mutual enhanced framework for implicit preference estimation, named PLM-IPE, which is capable of inferring the user’s implicit preferences exploiting visual cues, without any active or conscious engagement. PLM-IPE consists of three key modules: pixel-based estimator, landmark-based estimator and mutual learning based optimization. The former two modules work on capturing the implicit reaction of the user from the pixel level and landmark level, respectively. The last module serves to transfer knowledge between the two parallel estimators. Towards evaluation, we collected a real-world dataset, named SentiGarment, which contains 3,345 facial reaction videos paired with suggested outfits and human labeled reaction scores. Extensive experiments show the superiority of our model over state-of-the-art approaches.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123722148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Multi-branch Semantic Learning Network for Text-to-Image Synthesis 用于文本到图像合成的多分支语义学习网络
ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490567
Jiading Ling, Xingcai Wu, Zhenguo Yang, Xudong Mao, Qing Li, Wenyin Liu
{"title":"Multi-branch Semantic Learning Network for Text-to-Image Synthesis","authors":"Jiading Ling, Xingcai Wu, Zhenguo Yang, Xudong Mao, Qing Li, Wenyin Liu","doi":"10.1145/3469877.3490567","DOIUrl":"https://doi.org/10.1145/3469877.3490567","url":null,"abstract":"In this paper, we propose a multi-branch semantic learning network (MSLN) to generate image according to textual description by taking into account global and local textual semantics, which consists of two stages. The first stage generates a coarse-grained image based on the sentence features. In the second stage, a multi-branch fine-grained generation model is constructed to inject the sentence-level and word-level semantics into two coarse-grained images by global and local attention modules, which generate global and local fine-grained image textures, respectively. In particular, we devise a channel fusion module (CFM) to fuse the global and local fine-grained features in the multi-branch fine-grained stage and generate the output image. Extensive experiments conducted on the CUB-200 dataset and Oxford-102 dataset demonstrate the superior performance of the proposed method. (e.g., FID is reduced from 16.09 to 14.43 on CUB-200).","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"333 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124302043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generation of Variable-Length Time Series from Text using Dynamic Time Warping-Based Method 基于动态时间翘曲的文本变长时间序列生成方法
ACM Multimedia Asia Pub Date : 2021-12-01 DOI: 10.1145/3469877.3495644
Ayaka Ideno, Yusuke Mukuta, Tatsuya Harada
{"title":"Generation of Variable-Length Time Series from Text using Dynamic Time Warping-Based Method","authors":"Ayaka Ideno, Yusuke Mukuta, Tatsuya Harada","doi":"10.1145/3469877.3495644","DOIUrl":"https://doi.org/10.1145/3469877.3495644","url":null,"abstract":"This study is aimed at finding a suitable method for generating time-series data such as video clips or avatar motions from text stating multiple events. This paper addresses the generation of variable-length time-series data considering the order and variable duration of events stated in the text. Although the use of the variant of Mean Squared Error (MSE) is a common means of training, only the gap between the element of ground-truth (GT) data and generated data at the same time are considered. Thus, variants of MSE are unsuitable for the task at hand because the loss may not be small for the generated and GT data with the same order of events if the time for each event does not overlap. To solve the problem, we propose a Dynamic Time Warping-Like method for Variable-Length data (DTWL-VL), which determines the corresponding elements of the GT and the generated data, allowing for the time difference between them, and makes them closer. We compared DTWL-VL, a variant of MSE, and an existing method for time-series data generation which considers the time difference between the corresponding part in the GT and generated data. Since the existing method is aimed at generating fixed-length data, we extend the method for generating variable-length time-series data. We conducted experiments using a dataset prepared for this study. Both DTWL-VL and the existing methods outperformed the MSE variant. Moreover, although the existing method outperformed DTWL-VL under certain settings, DTWL-VL required a smaller training period.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124849591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信