Junfan Chen, Richong Zhang, Xiaohan Jiang, Chunming Hu
{"title":"SPContrastNet: A Self-Paced Contrastive Learning Model for Few-Shot Text Classification","authors":"Junfan Chen, Richong Zhang, Xiaohan Jiang, Chunming Hu","doi":"10.1145/3652600","DOIUrl":"https://doi.org/10.1145/3652600","url":null,"abstract":"<p>Meta-learning has recently promoted few-shot text classification, which identifies target classes based on information transferred from source classes through a series of small tasks or episodes. Existing works constructing their meta-learner on Prototypical Networks need improvement in learning discriminative text representations between similar classes that may lead to conflicts in label prediction. The overfitting problems caused by a few training instances need to be adequately addressed. In addition, efficient episode sampling procedures that could enhance few-shot training should be utilized. To address the problems mentioned above, we first present a contrastive learning framework that simultaneously learns discriminative text representations via supervised contrastive learning while mitigating the overfitting problem via unsupervised contrastive regularization, and then we build an efficient self-paced episode sampling approach on top of it to include more difficult episodes as training progresses. Empirical results on 8 few-shot text classification datasets show that our model outperforms the current state-of-the-art models. The extensive experimental analysis demonstrates that our supervised contrastive representation learning and unsupervised contrastive regularization techniques improve the performance of few-shot text classification. The episode-sampling analysis reveals that our self-paced sampling strategy improves training efficiency.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"36 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140166716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributional Fairness-aware Recommendation","authors":"Hao Yang, Xian Wu, Zhaopeng Qiu, Yefeng Zheng, Xu Chen","doi":"10.1145/3652854","DOIUrl":"https://doi.org/10.1145/3652854","url":null,"abstract":"<p>Fairness has been gradually recognized as a significant problem in the recommendation domain. Previous models usually achieve fairness by reducing the average performance gap between different user groups. However, the average performance may not sufficiently represent all the characteristics of the performances in a user group. Thus, equivalent average performance may not mean the recommender model is fair, for example, the variance of the performances can be different. To alleviate this problem, in this paper, we define a novel type of fairness, where we require that the performance distributions across different user groups should be similar. We prove that with the same performance distribution, the numerical characteristics of the group performance, including the expectation, variance and any higher order moment, are also the same. To achieve distributional fairness, we propose a generative and adversarial training framework. In specific, we regard the recommender model as the generator to compute the performance for each user in different groups, and then we deploy a discriminator to judge which group the performance is drawn from. By iteratively optimizing the generator and the discriminator, we can theoretically prove that the optimal generator (the recommender model) can indeed lead to the equivalent performance distributions. To smooth the adversarial training process, we propose a novel dual curriculum learning strategy for optimal scheduling of training samples. Additionally, we tailor our framework to better suit top-N recommendation tasks by incorporating softened ranking metrics as measures of performance discrepancies. We conduct extensive experiments based on real-world datasets to demonstrate the effectiveness of our model.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"143 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140166713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discrete Federated Multi-behavior Recommendation for Privacy-Preserving Heterogeneous One-Class Collaborative Filtering","authors":"Enyue Yang, Weike Pan, Qiang Yang, Zhong Ming","doi":"10.1145/3652853","DOIUrl":"https://doi.org/10.1145/3652853","url":null,"abstract":"<p>Recently, federated recommendation has become a research hotspot mainly because of users’ awareness of privacy in data. As a recent and important recommendation problem, in heterogeneous one-class collaborative filtering (HOCCF), each user may involve of two different types of implicit feedback, i.e., examinations and purchases. So far, privacy-preserving HOCCF has received relatively little attention. Existing federated recommendation works often overlook the fact that some privacy sensitive behaviors such as purchases should be collected to ensure the basic business imperatives in e-commerce for example. Hence, the user privacy constraints can and should be relaxed while deploying a recommendation system in real scenarios. In this paper, we study the federated multi-behavior recommendation problem under the assumption that purchase behaviors can be collected. Moreover, there are two additional challenges that need to be addressed when deploying federated recommendation. One is the low storage capacity for users’ devices to store all the item vectors, and the other is the low computational power for users to participate in federated learning. To release the potential of privacy-preserving HOCCF, we propose a novel framework, named discrete federated multi-behavior recommendation (DFMR), which allows the collection of the business necessary behaviors (i.e., purchases) by the server. As to reduce the storage overhead, we use discrete hashing techniques, which can compress the parameters down to 1.56% of the real-valued parameters. To further improve the computation-efficiency, we design a memorization strategy in the cache updating module to accelerate the training process. Extensive experiments on four public datasets show the superiority of our DFMR in terms of both accuracy and efficiency.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"87 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140166582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DHyper: A Recurrent Dual Hypergraph Neural Network for Event Prediction in Temporal Knowledge Graphs","authors":"Xing Tang, Ling Chen, Hongyu Shi, Dandan Lyu","doi":"10.1145/3653015","DOIUrl":"https://doi.org/10.1145/3653015","url":null,"abstract":"<p>Event prediction is a vital and challenging task in temporal knowledge graphs (TKGs), which have played crucial roles in various applications. Recently, many graph neural networks based approaches are proposed to model the graph structure information in TKGs. However, these approaches only construct graphs based on quadruplets and model the pairwise correlation between entities, which fail to capture the high-order correlations among entities. To this end, we propose DHyper, a recurrent <b>D</b>ual <b>Hyper</b>graph neural network for event prediction in TKGs, which simultaneously models the influences of both the high-order correlations among entities and among relations. Specifically, a dual hypergraph learning module is proposed to discover the high-order correlations among entities and among relations in a parameterized way. A dual hypergraph message passing network is introduced to perform the information aggregation and representation fusion on the entity hypergraph and the relation hypergraph. Extensive experiments on six real-world datasets demonstrate that DHyper achieves the state-of-the-art performances, outperforming the best baseline by an average of 13.09%, 4.26%, 17.60%, and 18.03% in MRR, Hits@1, Hits@3, and Hits@10, respectively.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"21 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140166714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Diversifying Sequential Recommendation with Retrospective and Prospective Transformers","authors":"Chaoyu Shi, Pengjie Ren, Dongjie Fu, Xin Xin, Shansong Yang, Fei Cai, Zhaochun Ren, Zhumin Chen","doi":"10.1145/3653016","DOIUrl":"https://doi.org/10.1145/3653016","url":null,"abstract":"<p>Previous studies on sequential recommendation (SR) have predominantly concentrated on optimizing recommendation accuracy. However, there remains a significant gap in enhancing recommendation diversity, particularly for short interaction sequences. The limited availability of interaction information in short sequences hampers the recommender’s ability to comprehensively model users’ intents, consequently affecting both the diversity and accuracy of recommendation. In light of the above challenge, we propose <i>reTrospective and pRospective Transformers for dIversified sEquential Recommendation</i> (TRIER). The TRIER addresses the issue of insufficient information in short interaction sequences by first retrospectively learning to predict users’ potential historical interactions, thereby introducing additional information and expanding short interaction sequences, and then capturing users’ potential intents from multiple augmented sequences. Finally, the TRIER learns to generate diverse recommendation lists by covering as many potential intents as possible. </p><p>To evaluate the effectiveness of TRIER, we conduct extensive experiments on three benchmark datasets. The experimental results demonstrate that TRIER significantly outperforms state-of-the-art methods, exhibiting diversity improvement of up to 11.36% in terms of intra-list distance (ILD@5) on the Steam dataset, 3.43% ILD@5 on the Yelp dataset and 3.77% in terms of category coverage (CC@5) on the Beauty dataset. As for accuracy, on the Yelp dataset, we observe notable improvement of 7.62% and 8.63% in HR@5 and NDCG@5, respectively. Moreover, we found that TRIER reveals more significant accuracy and diversity improvement for short interaction sequences.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"16 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140166710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-grained Document Modeling for Search Result Diversification","authors":"Zhirui Deng, Zhicheng Dou, Zhan Su, Ji-Rong Wen","doi":"10.1145/3652852","DOIUrl":"https://doi.org/10.1145/3652852","url":null,"abstract":"<p>Search result diversification plays a crucial role in improving users’ search experience by providing users with documents covering more subtopics. Previous studies have made great progress in leveraging inter-document interactions to measure the similarity among documents. However, different parts of the document may embody different subtopics and existing models ignore the subtle similarities and differences of content within each document. In this paper, we propose a hierarchical attention framework to combine intra-document interactions with inter-document interactions in a complementary manner in order to conduct multi-grained document modeling. Specifically, we separate the document into passages to model the document content from multi-grained perspectives. Then, we design stacked interaction blocks to conduct inter-document and intra-document interactions. Moreover, to measure the subtopic coverage of each document more accurately, we propose a passage-aware document-subtopic interaction to perform fine-grained document-subtopic interaction. Experimental results demonstrate that our model achieves state-of-the-art performance compared with existing methods.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"47 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140149512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander Frummet, Alessandro Speggiorin, David Elsweiler, Anton Leuski, Jeff Dalton
{"title":"Cooking with Conversation: Enhancing User Engagement and Learning with a Knowledge-Enhancing Assistant","authors":"Alexander Frummet, Alessandro Speggiorin, David Elsweiler, Anton Leuski, Jeff Dalton","doi":"10.1145/3649500","DOIUrl":"https://doi.org/10.1145/3649500","url":null,"abstract":"<p>We present two empirical studies to investigate users’ expectations and behaviours when using digital assistants, such as Alexa and Google Home, in a kitchen context: First, a survey (N=200) queries participants on their expectations for the kinds of information that such systems should be able to provide. While consensus exists on expecting information about cooking steps and processes, younger participants who enjoy cooking express a higher likelihood of expecting details on food history or the science of cooking. In a follow-up Wizard-of-Oz study (N = 48), users were guided through the steps of a recipe either by an <i>active</i> wizard that alerted participants to information it could provide or a <i>passive</i> wizard who only answered questions that were provided by the user. The <i>active</i> policy led to almost double the number of conversational utterances and 1.5 times more knowledge-related user questions compared to the <i>passive</i> policy. Also, it resulted in 1.7 times more knowledge communicated than the <i>passive</i> policy. We discuss the findings in the context of related work and reveal implications for the design and use of such assistants for cooking and other purposes such as DIY and craft tasks, as well as the lessons we learned for evaluating such systems.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"8 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140149509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Collaborative Sequential Recommendations via Multi-View GNN-Transformers","authors":"Tianze Luo, Yong Liu, Sinno Jialin Pan","doi":"10.1145/3649436","DOIUrl":"https://doi.org/10.1145/3649436","url":null,"abstract":"<p>Sequential recommendation systems aim to exploit users’ sequential behavior patterns to capture their interaction intentions and improve recommendation accuracy. Existing sequential recommendation methods mainly focus on modeling the items’ chronological relationships in each individual user behavior sequence, which may not be effective in making accurate and robust recommendations. On one hand, the performance of existing sequential recommendation methods is usually sensitive to the length of a user’s behavior sequence (<i>i.e.</i>, the list of a user’s historically interacted items). On the other hand, besides the context information in each individual user behavior sequence, the collaborative information among different users’ behavior sequences is also crucial to make accurate recommendations. However, this kind of information is usually ignored by existing sequential recommendation methods. In this work, we propose a new sequential recommendation framework, which encodes the context information in each individual user behavior sequence as well as the collaborative information among the behavior sequences of different users, through building a local dependency graph for each item. We conduct extensive experiments to compare the proposed model with state-of-the-art sequential recommendation methods on five benchmark datasets. The experimental results demonstrate that the proposed model is able to achieve better recommendation performance than existing methods, by incorporating collaborative information.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"24 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140149508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cross-Model Comparative Loss for Enhancing Neuronal Utility in Language Understanding","authors":"Yunchang Zhu, Liang Pang, Kangxi Wu, Yanyan Lan, Huawei Shen, Xueqi Cheng","doi":"10.1145/3652599","DOIUrl":"https://doi.org/10.1145/3652599","url":null,"abstract":"<p>Current natural language understanding (NLU) models have been continuously scaling up, both in terms of model size and input context, introducing more hidden and input neurons. While this generally improves performance on average, the extra neurons do not yield a consistent improvement for all instances. This is because some hidden neurons are redundant, and the noise mixed in input neurons tends to distract the model. Previous work mainly focuses on extrinsically reducing low-utility neurons by additional post- or pre-processing, such as network pruning and context selection, to avoid this problem. Beyond that, can we make the model reduce redundant parameters and suppress input noise by intrinsically enhancing the utility of each neuron? If a model can efficiently utilize neurons, no matter which neurons are ablated (disabled), the ablated submodel should perform no better than the original full model. Based on such a comparison principle between models, we propose a cross-model comparative loss for a broad range of tasks. Comparative loss is essentially a ranking loss on top of the task-specific losses of the full and ablated models, with the expectation that the task-specific loss of the full model is minimal. We demonstrate the universal effectiveness of comparative loss through extensive experiments on 14 datasets from 3 distinct NLU tasks based on 5 widely used pretrained language models and find it particularly superior for models with few parameters or long input.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"116 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140149426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ELAKT: Enhancing Locality for Attentive Knowledge Tracing","authors":"Yanjun Pu, Fang Liu, Rongye Shi, Haitao Yuan, Ruibo Chen, Tianhao Peng, WenJun Wu","doi":"10.1145/3652601","DOIUrl":"https://doi.org/10.1145/3652601","url":null,"abstract":"<p>Knowledge tracing models based on deep learning can achieve impressive predictive performance by leveraging attention mechanisms. However, there still exist two challenges in attentive knowledge tracing: First, the mechanism of classical models of attentive knowledge tracing demonstrates relatively low attention when processing exercise sequences with shifting knowledge concepts, making it difficult to capture the comprehensive state of knowledge across sequences. Second, classical models do not consider stochastic behaviors, which negatively affects models of attentive knowledge tracing in terms of capturing anomalous knowledge states. This paper proposes a model of attentive knowledge tracing, called Enhancing Locality for Attentive Knowledge Tracing (ELAKT), that is a variant of the deep knowledge tracing model. The proposed model leverages the encoder module of the transformer to aggregate knowledge embedding generated by both exercises and responses over all timesteps. In addition, it uses causal convolutions to aggregate and smooth the states of local knowledge. The ELAKT model uses the states of comprehensive knowledge concepts to introduce a prediction correction module to forecast the future responses of students to deal with noise caused by stochastic behaviors. The results of experiments demonstrated that the ELAKT model consistently outperforms state-of-the-art baseline knowledge tracing models.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"30 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140129402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}