Proceedings of the 2022 International Conference on Multimedia Retrieval最新文献_第4页

Sequential Intention-aware Recommender based on User Interaction Graph 基于用户交互图的顺序意图感知推荐

Proceedings of the 2022 International Conference on Multimedia Retrieval Pub Date : 2022-06-27 DOI: 10.1145/3512527.3531390

Jinpeng Chen, Yuan Cao, Fan Zhang, Pengfei Sun, Kaimin Wei

{"title":"Sequential Intention-aware Recommender based on User Interaction Graph","authors":"Jinpeng Chen, Yuan Cao, Fan Zhang, Pengfei Sun, Kaimin Wei","doi":"10.1145/3512527.3531390","DOIUrl":"https://doi.org/10.1145/3512527.3531390","url":null,"abstract":"The next-item recommendation problem has received more and more attention from researchers in recent years. Ignoring the implicit item semantic information, existing algorithms focus more on the user-item binary relationship and suffer from high data sparsity. Inspired by the fact that user's decision-making process is often influenced by both intention and preference, this paper presents a SequentiAl inTentiOn-aware Recommender based on a user Interaction graph (Satori). In Satori, we first use a novel user interaction graph to construct relationships between users, items, and categories. Then, we leverage a graph attention network to extract auxiliary features on the graph and generate the three embeddings. Next, we adopt self-attention mechanism to model user intention and preference respectively which are later combined to form a hybrid user representation. Finally, the hybrid user representation and previously obtained item representation are both sent to the prediction modul to calculate the predicted item score. Testing on real-world datasets, the results prove that our approach outperforms state-of-the-art methods.","PeriodicalId":179895,"journal":{"name":"Proceedings of the 2022 International Conference on Multimedia Retrieval","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131644133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Style-woven Attention Network for Zero-shot Ink Wash Painting Style Transfer 零拍水墨画风格转移的风格编织关注网络

Proceedings of the 2022 International Conference on Multimedia Retrieval Pub Date : 2022-06-27 DOI: 10.1145/3512527.3531391

Haochen Sun, L. Wu, Xiang Li, Xiangxu Meng

{"title":"Style-woven Attention Network for Zero-shot Ink Wash Painting Style Transfer","authors":"Haochen Sun, L. Wu, Xiang Li, Xiangxu Meng","doi":"10.1145/3512527.3531391","DOIUrl":"https://doi.org/10.1145/3512527.3531391","url":null,"abstract":"Traditional Chinese painting is a unique form of artistic expression. Compared with western art painting, it pays more attention to the verve in visual effect, especially ink painting, which makes good use of lines and pays little attention to information such as texture. Some style transfer methods have recently begun to apply traditional Chinese painting style (such as ink wash style) to photorealistic. Ink stylization of different types of real-world photos in a dataset using these style transfer methods has some limitations. When the input images are animal types that have not been seen in the training set, the generated results retain some semantic features of the data in the training set, resulting in distortion. Therefore, in this paper, we attempt to separate the feature representations for styles and contents and propose a style-woven attention network to achieve zero-shot ink wash painting style transfer. Our model learns to disentangle the data representations in an unsupervised fashion and capture the semantic correlations of content and style. In addition, an ink style loss is added to improve the learning ability of the style encoder. In order to verify the ability of ink wash stylization, we augmented the publicly available dataset $ChipPhi$. Extensive experiments based on a wide validation set prove that our method achieves state-of-the-art results.","PeriodicalId":179895,"journal":{"name":"Proceedings of the 2022 International Conference on Multimedia Retrieval","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128165004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

DMPCANet: A Low Dimensional Aggregation Network for Visual Place Recognition DMPCANet:一种用于视觉位置识别的低维聚合网络

Proceedings of the 2022 International Conference on Multimedia Retrieval Pub Date : 2022-06-27 DOI: 10.1145/3512527.3531427

Yinghao Wang, Haonan Chen, Jiong Wang, Yingying Zhu

{"title":"DMPCANet: A Low Dimensional Aggregation Network for Visual Place Recognition","authors":"Yinghao Wang, Haonan Chen, Jiong Wang, Yingying Zhu","doi":"10.1145/3512527.3531427","DOIUrl":"https://doi.org/10.1145/3512527.3531427","url":null,"abstract":"Visual place recognition (VPR) aims to estimate the geographical location of a query image by finding its nearest reference images from a large geo-tagged database. Most of the existing methods adopt convolutional neural networks to extract feature maps from images. Nevertheless, such feature maps are high-dimensional tensors, and it is a challenge to effectively aggregate them into a compact vector representation for efficient retrieval. To tackle this challenge, we develop an end-to-end convolutional neural network architecture named DMPCANet. The network adopts the regional pooling module to generate feature tensors of the same size from images of different sizes. The core component of our network, the Differentiable Multilinear Principal Component Analysis (DMPCA) module, directly acts on tensor data and utilizes convolution operations to generate projection matrices for dimensionality reduction, thereby reducing the dimensionality to one sixteenth. This module can preserve crucial information while reducing data dimensions. Experiments on two widely used place recognition datasets demonstrate that our proposed DMPCANet can generate low-dimensional discriminative global descriptors and achieve the state-of-the-art results.","PeriodicalId":179895,"journal":{"name":"Proceedings of the 2022 International Conference on Multimedia Retrieval","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114174903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TransPCC: Towards Deep Point Cloud Compression via Transformers TransPCC:通过变压器实现深度点云压缩

Proceedings of the 2022 International Conference on Multimedia Retrieval Pub Date : 2022-06-27 DOI: 10.1145/3512527.3531423

Zujie Liang, Fan Liang

引用次数: 5

Cross-Modal Retrieval between Event-Dense Text and Image 事件密集文本和图像之间的跨模态检索

Proceedings of the 2022 International Conference on Multimedia Retrieval Pub Date : 2022-06-27 DOI: 10.1145/3512527.3531374

Zhongwei Xie, Lin Li, Luo Zhong, Jianquan Liu, Ling Liu

{"title":"Cross-Modal Retrieval between Event-Dense Text and Image","authors":"Zhongwei Xie, Lin Li, Luo Zhong, Jianquan Liu, Ling Liu","doi":"10.1145/3512527.3531374","DOIUrl":"https://doi.org/10.1145/3512527.3531374","url":null,"abstract":"This paper presents a novel approach to the problem of event-dense text and image cross-modal retrieval where the text contains the descriptions of numerous events. It is known that modality alignment is crucial for retrieval performance. However, due to the lack of event sequence information in the image, it is challenging to perform the fine-grain alignment of the event-dense text with the image. Our proposed approach incorporates the event-oriented features to enhance the cross-modal alignment, and applies the event-dense text-image retrieval to the food domain for empirical validation. Specifically, we capture the significance of each event by Transformer, and combine it with the identified key event elements, to enhance the discriminative ability of the learned text embedding that summarizes all the events. Next, we produce the image embedding by combining the event tag jointly shared by the text and image with the visual embedding of the event-related image regions, which describes the eventual consequence of all the events and facilitates the event-based cross-modal alignment. Finally, we integrate text embedding and image embedding with the loss optimization empowered with the event tag by iteratively regulating the joint embedding learning for cross-modal retrieval. Extensive experiments demonstrate that our proposed event-oriented modality alignment approach significantly outperforms the state-of-the-art approach with a 23.3% improvement on top-1 Recall for image-to-recipe retrieval on Recipe1M 10k test set.","PeriodicalId":179895,"journal":{"name":"Proceedings of the 2022 International Conference on Multimedia Retrieval","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121117803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

HybridVocab: Towards Multi-Modal Machine Translation via Multi-Aspect Alignment HybridVocab:通过多角度对齐实现多模态机器翻译

Proceedings of the 2022 International Conference on Multimedia Retrieval Pub Date : 2022-06-27 DOI: 10.1145/3512527.3531386

Ru Peng, Yawen Zeng, J. Zhao

{"title":"HybridVocab: Towards Multi-Modal Machine Translation via Multi-Aspect Alignment","authors":"Ru Peng, Yawen Zeng, J. Zhao","doi":"10.1145/3512527.3531386","DOIUrl":"https://doi.org/10.1145/3512527.3531386","url":null,"abstract":"Multi-modal machine translation (MMT) aims to augment the linguistic machine translation frameworks by incorporating aligned vision information. As the core research challenge for MMT, how to fuse the image information and further align it with the bilingual data remains critical. Existing works have either focused on a methodological alignment in the space of bilingual text or emphasized the combination of the one-sided text and given image. In this work, we entertain the possibility of a triplet alignment, among the source and target text together with the image instance. In particular, we propose Multi-aspect AlignmenT (MAT) model that augments the MMT tasks to three sub-tasks --- namely cross-language translation alignment, cross-modal captioning alignment and multi-modal hybrid alignment tasks. Core to this model consists of a hybrid vocabulary which compiles the visually depictable entity (nouns) occurrence on both sides of the text as well as the detected object labels appearing in the images. Through this sub-task, we postulate that MAT manages to further align the modalities by casting three instances into a shared domain, as compared against previously proposed methods. Extensive experiments and analyses demonstrate the superiority of our approaches, which achieve several state-of-the-art results on two benchmark datasets of the MMT task.","PeriodicalId":179895,"journal":{"name":"Proceedings of the 2022 International Conference on Multimedia Retrieval","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121332006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Mobile Emotion Recognition via Multiple Physiological Signals using Convolution-augmented Transformer 基于卷积增强变压器的多生理信号移动情感识别

Proceedings of the 2022 International Conference on Multimedia Retrieval Pub Date : 2022-06-27 DOI: 10.1145/3512527.3531385

Kangning Yang, Benjamin Tag, Yue Gu, Chaofan Wang, Tilman Dingler, G. Wadley, Jorge Gonçalves

{"title":"Mobile Emotion Recognition via Multiple Physiological Signals using Convolution-augmented Transformer","authors":"Kangning Yang, Benjamin Tag, Yue Gu, Chaofan Wang, Tilman Dingler, G. Wadley, Jorge Gonçalves","doi":"10.1145/3512527.3531385","DOIUrl":"https://doi.org/10.1145/3512527.3531385","url":null,"abstract":"Recognising and monitoring emotional states play a crucial role in mental health and well-being management. Importantly, with the widespread adoption of smart mobile and wearable devices, it has become easier to collect long-term and granular emotion-related physiological data passively, continuously, and remotely. This creates new opportunities to help individuals manage their emotions and well-being in a less intrusive manner using off-the-shelf low-cost devices. Pervasive emotion recognition based on physiological signals is, however, still challenging due to the difficulty to efficiently extract high-order correlations between physiological signals and users' emotional states. In this paper, we propose a novel end-to-end emotion recognition system based on a convolution-augmented transformer architecture. Specifically, it can recognise users' emotions on the dimensions of arousal and valence by learning both the global and local fine-grained associations and dependencies within and across multimodal physiological data (including blood volume pulse, electrodermal activity, heart rate, and skin temperature). We extensively evaluated the performance of our model using the K-EmoCon dataset, which is acquired in naturalistic conversations using off-the-shelf devices and contains spontaneous emotion data. Our results demonstrate that our approach outperforms the baselines and achieves state-of-the-art or competitive performance. We also demonstrate the effectiveness and generalizability of our system on another affective dataset which used affect inducement and commercial physiological sensors.","PeriodicalId":179895,"journal":{"name":"Proceedings of the 2022 International Conference on Multimedia Retrieval","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121386336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Learning Hierarchical Semantic Correspondences for Cross-Modal Image-Text Retrieval 跨模态图像-文本检索的分层语义对应学习

Proceedings of the 2022 International Conference on Multimedia Retrieval Pub Date : 2022-06-27 DOI: 10.1145/3512527.3531358

Sheng Zeng, Changhong Liu, J. Zhou, Yong Chen, Aiwen Jiang, Hanxi Li

{"title":"Learning Hierarchical Semantic Correspondences for Cross-Modal Image-Text Retrieval","authors":"Sheng Zeng, Changhong Liu, J. Zhou, Yong Chen, Aiwen Jiang, Hanxi Li","doi":"10.1145/3512527.3531358","DOIUrl":"https://doi.org/10.1145/3512527.3531358","url":null,"abstract":"Cross-modal image-text retrieval is a fundamental task in information retrieval. The key to this task is to address both heterogeneity and cross-modal semantic correlation between data of different modalities. Fine-grained matching methods can nicely model local semantic correlations between image and text but face two challenges. First, images may contain redundant information while text sentences often contain words without semantic meaning. Such redundancy interferes with the local matching between textual words and image regions. Furthermore, the retrieval shall consider not only low-level semantic correspondence between image regions and textual words but also a higher semantic correlation between different intra-modal relationships. We propose a multi-layer graph convolutional network with object-level, object-relational-level, and higher-level learning sub-networks. Our method learns hierarchical semantic correspondences by both local and global alignment. We further introduce a self-attention mechanism after the word embedding to weaken insignificant words in the sentence and a cross-attention mechanism to guide the learning of image features. Extensive experiments on Flickr30K and MS-COCO datasets demonstrate the effectiveness and superiority of our proposed method.","PeriodicalId":179895,"journal":{"name":"Proceedings of the 2022 International Conference on Multimedia Retrieval","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121650967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

STAFNet: Swin Transformer Based Anchor-Free Network for Detection of Forward-looking Sonar Imagery 基于Swin变压器的无锚网络检测前视声纳图像

Proceedings of the 2022 International Conference on Multimedia Retrieval Pub Date : 2022-06-27 DOI: 10.1145/3512527.3531398

Xingyu Zhu, Yingshuo Liang, Jianlei Zhang, Zengqiang Chen

引用次数: 0

Camouflaged Poisoning Attack on Graph Neural Networks 图神经网络的伪装中毒攻击

Proceedings of the 2022 International Conference on Multimedia Retrieval Pub Date : 2022-06-27 DOI: 10.1145/3512527.3531373

Chao Jiang, Yingzhe He, Richard Chapman, Hongyi Wu

{"title":"Camouflaged Poisoning Attack on Graph Neural Networks","authors":"Chao Jiang, Yingzhe He, Richard Chapman, Hongyi Wu","doi":"10.1145/3512527.3531373","DOIUrl":"https://doi.org/10.1145/3512527.3531373","url":null,"abstract":"Graph neural networks (GNNs) have enabled the automation of many web applications that entail node classification on graphs, such as scam detection in social media and event prediction in service networks. Nevertheless, recent studies revealed that the GNNs are vulnerable to adversarial attacks, where feeding GNNs with poisoned data at training time can lead them to yield catastrophically devastative test accuracy. This finding heats up the frontier of attacks and defenses against GNNs. However, the prior studies mainly posit that the adversaries can enjoy free access to manipulate the original graph, while obtaining such access could be too costly in practice. To fill this gap, we propose a novel attacking paradigm, named Generative Adversarial Fake Node Camouflaging (GAFNC), with its crux lying in crafting a set of fake nodes in a generative-adversarial regime. These nodes carry camouflaged malicious features and can poison the victim GNN by passing their malicious messages to the original graph via learned topological structures, such that they 1) maximize the devastation of classification accuracy (i.e., global attack) or 2) enforce the victim GNN to misclassify a targeted node set into prescribed classes (i.e., target attack). We benchmark our experiments on four real-world graph datasets, and the results substantiate the viability, effectiveness, and stealthiness of our proposed poisoning attack approach. Code is released in github.com/chao92/GAFNC.","PeriodicalId":179895,"journal":{"name":"Proceedings of the 2022 International Conference on Multimedia Retrieval","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126397330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6