ACM Transactions on Intelligent Systems and Technology最新文献_第4页

Quintuple-based Representation Learning for Bipartite Heterogeneous Networks 基于五元表征的双元异构网络学习

IF 5 4区计算机科学

ACM Transactions on Intelligent Systems and Technology Pub Date : 2024-03-26 DOI: 10.1145/3653978

Cangqi Zhou, Hui Chen, Jing Zhang, Qianmu Li, Dianming Hu

{"title":"Quintuple-based Representation Learning for Bipartite Heterogeneous Networks","authors":"Cangqi Zhou, Hui Chen, Jing Zhang, Qianmu Li, Dianming Hu","doi":"10.1145/3653978","DOIUrl":"https://doi.org/10.1145/3653978","url":null,"abstract":"Recent years have seen rapid progress in network representation learning, which removes the need for burdensome feature engineering and facilitates downstream network-based tasks. In reality, networks often exhibit heterogeneity, which means there may exist multiple types of nodes and interactions. Heterogeneous networks raise new challenges to representation learning, as the awareness of node and edge types is required. In this paper, we study a basic building block of general heterogeneous networks, the heterogeneous networks with two types of nodes. Many problems can be solved by decomposing general heterogeneous networks into multiple bipartite ones. Recently, to overcome the demerits of non-metric measures used in the embedding space, metric learning-based approaches have been leveraged to tackle heterogeneous network representation learning. These approaches first generate triplets of samples, in which an anchor node, a positive counterpart and a negative one co-exist, and then try to pull closer positive samples and push away negative ones. However, when dealing with heterogeneous networks, even the simplest two-typed ones, triplets cannot simultaneously involve both positive and negative samples from different parts of networks. To address this incompatibility of triplet-based metric learning, in this paper, we propose a novel quintuple-based method for learning node representations in bipartite heterogeneous networks. Specifically, we generate quintuples that contain positive and negative samples from two different parts of networks. And we formulate two learning objectives that accommodate quintuple-based learning samples, a proximity-based loss that models the relations in quintuples by sigmoid probabilities, and an angular loss that more robustly maintains similarity structures. In addition, we also parameterize feature learning by using one-dimensional convolution operators around nodes’ neighborhoods. Compared with eight methods, extensive experiments on two downstream tasks manifest the effectiveness of our approach.","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"44 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140316922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust Structure-Aware Graph-based Semi-Supervised Learning: Batch and Recursive Processing 稳健的结构感知图式半监督学习：批处理和递归处理

IF 5 4区计算机科学

ACM Transactions on Intelligent Systems and Technology Pub Date : 2024-03-26 DOI: 10.1145/3653986

Xu Chen

{"title":"Robust Structure-Aware Graph-based Semi-Supervised Learning: Batch and Recursive Processing","authors":"Xu Chen","doi":"10.1145/3653986","DOIUrl":"https://doi.org/10.1145/3653986","url":null,"abstract":"Graph-based semi-supervised learning plays an important role in large scale image classification tasks. However, the problem becomes very challenging in the presence of noisy labels and outliers. Moreover, traditional robust semi-supervised learning solutions suffers from prohibitive computational burdens thus cannot be computed for streaming data. Motivated by that, we present a novel unified framework robust structure-aware semi-supervised learning called Unified RSSL (URSSL) for batch processing and recursive processing robust to both outliers and noisy labels. Particularly, URSSL applies joint semi-supervised dimensionality reduction with robust estimators and network sparse regularization simultaneously on the graph Laplacian matrix iteratively to preserve the intrinsic graph structure and ensure robustness to the compound noise. First, in order to relieve the influence from outliers, a novel semi-supervised robust dimensionality reduction is applied relying on robust estimators to suppress outliers. Meanwhile, to tackle noisy labels, the denoised graph similarity information is encoded into the network regularization. Moreover, by identifying strong relevance of dimensionality reduction and network regularization in the context of robust semi-supervised learning (RSSL), a two-step alternative optimization is derived to compute optimal solutions with guaranteed convergence. We further derive our framework to adapt to large scale semi-supervised learning particularly suitable for large scale image classification and demonstrate the model robustness under different adversarial attacks. For recursive processing, we rely on reparameterization to transform the formulation to unlock the challenging problem of robust streaming-based semi-supervised learning. Last but not least, we extend our solution into distributed solutions to resolve the challenging issue of distributed robust semi-supervised learning when images are captured by multiple cameras at different locations. Extensive experimental results demonstrate the promising performance of this framework when applied to multiple benchmark datasets with respect to state-of-the-art approaches for important applications in the areas of image classification and spam data analysis.","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"2016 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140298449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Federated Momentum Contrastive Clustering 联邦动量对比聚类

IF 5 4区计算机科学

ACM Transactions on Intelligent Systems and Technology Pub Date : 2024-03-26 DOI: 10.1145/3653981

Runxuan Miao, Erdem Koyuncu

{"title":"Federated Momentum Contrastive Clustering","authors":"Runxuan Miao, Erdem Koyuncu","doi":"10.1145/3653981","DOIUrl":"https://doi.org/10.1145/3653981","url":null,"abstract":"Self-supervised representation learning and deep clustering are mutually beneficial to learn high-quality representations and cluster data simultaneously in centralized settings. However, it is not always feasible to gather large amounts of data at a central entity, considering data privacy requirements and computational resources. Federated Learning (FL) has been developed successfully to aggregate a global model while training on distributed local data, respecting the data privacy of edge devices. However, most FL research effort focuses on supervised learning algorithms. A fully unsupervised federated clustering scheme has not been considered in the existing literature. We present federated momentum contrastive clustering (FedMCC), a generic federated clustering framework that can not only cluster data automatically but also extract discriminative representations training from distributed local data over multiple users. In FedMCC, we demonstrate a two-stage federated learning paradigm where the first stage aims to learn differentiable instance embeddings and the second stage accounts for clustering data automatically. The experimental results show that FedMCC not only achieves superior clustering performance but also outperforms several existing federated self-supervised methods for linear evaluation and semi-supervised learning tasks. Additionally, FedMCC can easily be adapted to ordinary centralized clustering through what we call momentum contrastive clustering (MCC). We show that MCC achieves state-of-the-art clustering accuracy results in certain datasets such as STL-10 and ImageNet-10. We also present a method to reduce the memory footprint of our clustering schemes.","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"6 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140298603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Explainable finite mixture of mixtures of bounded asymmetric generalized Gaussian and Uniform distributions learning for energy demand management 用于能源需求管理的有界非对称广义高斯分布和均匀分布学习的可解释有限混合物

IF 5 4区计算机科学

ACM Transactions on Intelligent Systems and Technology Pub Date : 2024-03-26 DOI: 10.1145/3653980

Hussein Al-Bazzaz, Muhammad Azam, Manar Amayri, Nizar Bouguila

{"title":"Explainable finite mixture of mixtures of bounded asymmetric generalized Gaussian and Uniform distributions learning for energy demand management","authors":"Hussein Al-Bazzaz, Muhammad Azam, Manar Amayri, Nizar Bouguila","doi":"10.1145/3653980","DOIUrl":"https://doi.org/10.1145/3653980","url":null,"abstract":"We introduce a mixture of mixtures of bounded asymmetric generalized Gaussian and uniform distributions. Based on this framework, we propose model-based classification and model-based clustering algorithms. We develop an objective function for the minimum message length (MML) model selection criterion to discover the optimal number of clusters for the unsupervised approach of our proposed model. Given the crucial attention received by Explainable AI (XAI) in recent years, we introduce a method to interpret the predictions obtained from the proposed model in both learning settings by defining their boundaries in terms of the crucial features. Integrating Explainability within our proposed algorithm increases the credibility of the algorithm’s predictions since it would be explainable to the user’s perspective through simple If-Then statements using a small binary decision tree. In this paper, the proposed algorithm proves its reliability and superiority to several state-of-the-art machine learning algorithms within the following real-world applications: fault detection and diagnosis (FDD) in chillers, occupancy estimation and categorization of residential energy consumers.","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"117 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140315532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mitigating the Impact of Inaccurate Feedback in Dynamic Learning-to-Rank: A Study of Overlooked Interesting Items 减轻动态排名学习中不准确反馈的影响：被忽视的有趣项目研究

IF 5 4区计算机科学

ACM Transactions on Intelligent Systems and Technology Pub Date : 2024-03-26 DOI: 10.1145/3653983

Chenhao Zhang, Weitong Chen, Wei Emma Zhang, Miao Xu

{"title":"Mitigating the Impact of Inaccurate Feedback in Dynamic Learning-to-Rank: A Study of Overlooked Interesting Items","authors":"Chenhao Zhang, Weitong Chen, Wei Emma Zhang, Miao Xu","doi":"10.1145/3653983","DOIUrl":"https://doi.org/10.1145/3653983","url":null,"abstract":"Dynamic Learning-to-Rank (DLTR) is a method of updating a ranking policy in real-time based on user feedback, which may not always be accurate. Although previous DLTR work has achieved fair and unbiased DLTR under inaccurate feedback, they face the trade-off between fairness and user utility and also have limitations in the setting of feeding items. Existing DLTR works improve ranking utility by eliminating bias from inaccurate feedback on observed items, but the impact of another pervasive form of inaccurate feedback, overlooked or ignored interesting items, remains unclear. For example, users may browse the rankings too quickly to catch interesting items or miss interesting items because the snippets are not optimized enough. This phenomenon raises two questions: i) Will overlooked interesting items affect the ranking results? ii) Is it possible to improve utility without sacrificing fairness if these effects are eliminated? These questions are particularly relevant for small and medium-sized retailers who are just starting out and may have limited data, leading to the use of inaccurate feedback to update their models. In this paper, we find that inaccurate feedback in the form of overlooked interesting items has a negative impact on DLTR performance in terms of utility. To address this, we treat the overlooked interesting items as noise and propose a novel DLTR method, the Co-teaching Rank (CoTeR), that has good utility and fairness performance when inaccurate feedback is present in the form of overlooked interesting items. Our solution incorporates a co-teaching-based component with a customized loss function and data sampling strategy, as well as a mean pooling strategy to further accommodate newly added products without historical data. Through experiments, we demonstrate that CoTeRx not only enhances utilities but also preserves ranking fairness, and can smoothly handle newly introduced items.","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"72 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140316893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Empowering Predictive Modeling by GAN-based Causal Information Learning 通过基于 GAN 的因果信息学习增强预测建模能力

IF 5 4区计算机科学

ACM Transactions on Intelligent Systems and Technology Pub Date : 2024-03-20 DOI: 10.1145/3652610

Jinwei Zeng, Guozhen Zhang, Jian Yuan, Yong Li, Depeng Jin

{"title":"Empowering Predictive Modeling by GAN-based Causal Information Learning","authors":"Jinwei Zeng, Guozhen Zhang, Jian Yuan, Yong Li, Depeng Jin","doi":"10.1145/3652610","DOIUrl":"https://doi.org/10.1145/3652610","url":null,"abstract":"Generally speaking, we can easily specify many causal relationships in the prediction tasks of ubiquitous computing, such as human activity prediction, mobility prediction, and health prediction. However, most of the existing methods in these fields failed to take advantage of this prior causal knowledge. They typically make predictions only based on correlations in the data, which hinders the prediction performance in real-world scenarios because a distribution shift between training data and testing data generally exists. To fill in this gap, we proposed a <underline>G</underline>AN-based <underline>C</underline>ausal <underline>I</underline>nformation <underline>L</underline>earning prediction framework (GCIL), which can effectively leverage causal information to improve the prediction performance of existing ubiquitous computing deep learning models. Specifically, faced with a unique challenge that the treatment variable, referring to the intervention that influences the target in a causal relationship, is generally continuous in ubiquitous computing, the framework employs a representation learning approach with a GAN-based deep learning model. By projecting all variables except the treatment into a latent space, it effectively minimizes confounding bias and leverages the learned latent representation for accurate predictions. In this way, it deals with the continuous treatment challenge, and in the meantime, it can be easily integrated with existing deep learning models to lift their prediction performance in practical scenarios with causal information. Extensive experiments on two large-scale real-world datasets demonstrate its superior performance over multiple state-of-the-art baselines. We also propose an analytical framework together with extensive experiments to empirically show that our framework achieves better performance gain under two conditions: when the distribution differences between the training data and the testing data are more significant and when the treatment effects are larger. Overall, this work suggests that learning causal information is a promising way to improve the prediction performance of ubiquitous computing tasks. We open both our dataset and code1 and call for more research attention in this area.","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"123 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140172409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Meta-learning Framework for Tuning Parameters of Protection Mechanisms in Trustworthy Federated Learning 用于调整可信联合学习中保护机制参数的元学习框架

IF 5 4区计算机科学

ACM Transactions on Intelligent Systems and Technology Pub Date : 2024-03-18 DOI: 10.1145/3652612

Xiaojin Zhang, Yan Kang, Lixin Fan, Kai Chen, Qiang Yang

{"title":"A Meta-learning Framework for Tuning Parameters of Protection Mechanisms in Trustworthy Federated Learning","authors":"Xiaojin Zhang, Yan Kang, Lixin Fan, Kai Chen, Qiang Yang","doi":"10.1145/3652612","DOIUrl":"https://doi.org/10.1145/3652612","url":null,"abstract":"Trustworthy Federated Learning (TFL) typically leverages protection mechanisms to guarantee privacy. However, protection mechanisms inevitably introduce utility loss or efficiency reduction while protecting data privacy. Therefore, protection mechanisms and their parameters should be carefully chosen to strike an optimal trade-off between privacy leakage, utility loss, and efficiency reduction. To this end, federated learning practitioners need tools to measure the three factors and optimize the trade-off between them to choose the protection mechanism that is most appropriate to the application at hand. Motivated by this requirement, we propose a framework that (1) formulates TFL as a problem of finding a protection mechanism to optimize the trade-off between privacy leakage, utility loss, and efficiency reduction and (2) formally defines bounded measurements of the three factors. We then propose a meta-learning algorithm to approximate this optimization problem and find optimal protection parameters for representative protection mechanisms, including Randomization, Homomorphic Encryption, Secret Sharing, and Compression. We further design estimation algorithms to quantify these found optimal protection parameters in a practical horizontal federated learning setting and provide a theoretical analysis of the estimation error.","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"142 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140152199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Perceiving Actions via Temporal Video Frame Pairs 通过时态视频帧对感知动作

IF 5 4区计算机科学

ACM Transactions on Intelligent Systems and Technology Pub Date : 2024-03-17 DOI: 10.1145/3652611

Rongchang Li, Tianyang Xu, Xiao-Jun Wu, Zhongwei Shen, Josef Kittler

{"title":"Perceiving Actions via Temporal Video Frame Pairs","authors":"Rongchang Li, Tianyang Xu, Xiao-Jun Wu, Zhongwei Shen, Josef Kittler","doi":"10.1145/3652611","DOIUrl":"https://doi.org/10.1145/3652611","url":null,"abstract":"Video action recognition aims to classify the action category in given videos. In general, semantic-relevant video frame pairs reflect significant action patterns such as object appearance variation and abstract temporal concepts like speed, rhythm, etc. However, existing action recognition approaches tend to holistically extract spatiotemporal features. Though effective, there is still a risk of neglecting the crucial action features occurring across frames with a long-term temporal span. Motivated by this, in this paper, we propose to perceive actions via frame pairs directly and devise a novel Nest Structure with frame pairs as basic units. Specifically, we decompose a video sequence into all possible frame pairs and hierarchically organize them according to temporal frequency and order, thus transforming the original video sequence into a Nest Structure. Through naturally decomposing actions, the proposed structure can flexibly adapt to diverse action variations such as speed or rhythm changes. Next, we devise a Temporal Pair Analysis module (TPA) to extract discriminative action patterns based on the proposed Nest Structure. The designed TPA module consists of a pair calculation part to calculate the pair features and a pair fusion part to hierarchically fuse the pair features for recognizing actions. The proposed TPA can be flexibly integrated into existing backbones, serving as a side branch to capture various action patterns from multi-level features. Extensive experiments show that the proposed TPA module can achieve consistent improvements over several typical backbones, reaching or updating CNN-based SOTA results on several challenging action recognition benchmarks.","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"57 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140156575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Ensuring Fairness and Gradient Privacy in Personalized Heterogeneous Federated Learning 确保个性化异构联合学习的公平性和梯度隐私性

IF 5 4区计算机科学

ACM Transactions on Intelligent Systems and Technology Pub Date : 2024-03-13 DOI: 10.1145/3652613

Cody Lewis, Vijay Varadharajan, Nasimul Noman, Uday Tupakula

{"title":"Ensuring Fairness and Gradient Privacy in Personalized Heterogeneous Federated Learning","authors":"Cody Lewis, Vijay Varadharajan, Nasimul Noman, Uday Tupakula","doi":"10.1145/3652613","DOIUrl":"https://doi.org/10.1145/3652613","url":null,"abstract":"With the increasing tension between conflicting requirements of the availability of large amounts of data for effective machine learning based analysis, and for ensuring their privacy, the paradigm of federated learning has emerged, a distributed machine learning setting where the clients provide only the machine learning model updates to the server rather than the actual data for decision making. However, the distributed nature of federated learning raises specific challenges related to fairness in a heterogeneous setting. This motivates the focus of our paper, on the heterogeneity of client devices having different computational capabilities and their impact on fairness in federated learning. Furthermore, our aim is to achieve fairness in heterogeneity while ensuring privacy. As far as we are aware there are no existing works that address all these three aspects of fairness, device heterogeneity and privacy simultaneously in federated learning. In this paper, we propose a novel federated learning algorithm with personalization in the context of heterogeneous devices while maintaining compatibility with the gradient privacy preservation techniques of secure aggregation. We analyze the proposed federated learning algorithm under different environments with different datasets, and show that it achieves performance close to or greater than the state-of-the-art in heterogeneous device personalized federated learning. We also provide theoretical proofs for the fairness and convergence properties of our proposed algorithm.","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"9 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140124512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multimodal Dialogue Systems via Capturing Context-aware Dependencies and Ordinal Information of Semantic Elements 通过捕捉语义要素的上下文感知依赖关系和排序信息实现多模态对话系统

IF 5 4区计算机科学

ACM Transactions on Intelligent Systems and Technology Pub Date : 2024-03-12 DOI: 10.1145/3645099

Weidong He, Zhi Li, Hao Wang, Tong Xu, Zhefeng Wang, Baoxing Huai, Nicholas Jing Yuan, Enhong Chen

{"title":"Multimodal Dialogue Systems via Capturing Context-aware Dependencies and Ordinal Information of Semantic Elements","authors":"Weidong He, Zhi Li, Hao Wang, Tong Xu, Zhefeng Wang, Baoxing Huai, Nicholas Jing Yuan, Enhong Chen","doi":"10.1145/3645099","DOIUrl":"https://doi.org/10.1145/3645099","url":null,"abstract":"The topic of multimodal conversation systems has recently garnered significant attention across various industries, including travel, retail, and others. While pioneering works in this field have shown promising performance, they often focus solely on context information at the utterance level, overlooking the context-aware dependencies of multimodal semantic elements like words and images. Furthermore, the ordinal information of images, which indicates the relevance between visual context and users’ demands, remains underutilized during the integration of visual content. Additionally, the exploration of how to effectively utilize corresponding attributes provided by users when searching for desired products is still largely unexplored. To address these challenges, we propose a Position-aware Multimodal diAlogue system with semanTic Elements, abbreviated as PMATE. Specifically, to obtain semantic representations at the element-level, we first unfold the multimodal historical utterances and devise a position-aware multimodal element-level encoder. This component considers all images that may be relevant to the current turn and introduces a novel position-aware image selector to choose related images before fusing the information from the two modalities. Finally, we present a knowledge-aware two-stage decoder and an attribute-enhanced image searcher for the tasks of generating textual responses and selecting image responses, respectively. We extensively evaluate our model on two large-scale multimodal dialog datasets, and the results of our experiments demonstrate that our approach outperforms several baseline methods.","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"60 1","pages":""},"PeriodicalIF":5.0,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140152192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0