Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining最新文献_第7页

k-Clustering with Fair Outliers 具有公平离群值的k聚类

Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining Pub Date : 2022-02-11 DOI: 10.1145/3488560.3498485

Matteo Almanza, Alessandro Epasto, A. Panconesi, Giuseppe Re

{"title":"k-Clustering with Fair Outliers","authors":"Matteo Almanza, Alessandro Epasto, A. Panconesi, Giuseppe Re","doi":"10.1145/3488560.3498485","DOIUrl":"https://doi.org/10.1145/3488560.3498485","url":null,"abstract":"Clustering problems and clustering algorithms are often overly sensitive to the presence of outliers: even a handful of points can greatly affect the structure of the optimal solution and its cost. This is why many algorithms for robust clustering problems have been formulated in recent years. These algorithms discard some points as outliers, excluding them from the clustering. However, outlier selection can be unfair: some categories of input points may be disproportionately affected by the outlier removal algorithm. We study the problem of k-clustering with fair outlier removal and provide the first approximation algorithm for well-known clustering formulations, such as k-means and k-median. We analyze this algorithm and prove that it has strong theoretical guarantees. We complement this result with an empirical evaluation showing that, while standard methods for outlier removal have a disproportionate impact across categories of input points, our algorithm equalizes the impact while retaining strong experimental performances on multiple real--world datasets. We also show how the fairness of outlier removal can influence the performance of a downstream learning task. Finally, we provide a coreset construction, which makes our algorithm scalable to very large datasets.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114733599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Directed Network Embedding with Virtual Negative Edges 具有虚拟负边的有向网络嵌入

Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining Pub Date : 2022-02-11 DOI: 10.1145/3488560.3498470

Hyunsik Yoo, Yeon-Chang Lee, Kijung Shin, Sang-Wook Kim

{"title":"Directed Network Embedding with Virtual Negative Edges","authors":"Hyunsik Yoo, Yeon-Chang Lee, Kijung Shin, Sang-Wook Kim","doi":"10.1145/3488560.3498470","DOIUrl":"https://doi.org/10.1145/3488560.3498470","url":null,"abstract":"The directed network embedding problem is to represent the nodes in a given directed network as embeddings (i.e., low-dimensional vectors) that preserve the asymmetric relationships between nodes. While a number of approaches have been developed for this problem, we point out that existing approaches commonly face difficulties in accurately preserving asymmetric proximities between nodes in a sparse network containing a large number of low out- and in-degree nodes. In this paper, we focus on addressing this intrinsic difficulty caused by the lack of information. We first introduce the concept of virtual negative edges (VNEs), which represent latent negative relationships between nodes. Based on the concept, we propose a novel DIrected NE approach with VIrtual Negative Edges, named as DIVINE. DIVINE carefully decides the number and locations of VNEs to be added to the input network. Once VNEs are added, DIVINE learns embeddings by exploiting both the signs and directions of edges. Our experiments on four real-world directed networks demonstrate that adding VNEs alleviates the lack of information about low-degree nodes, thereby enabling DIVINE to yield high-quality embeddings that accurately capture asymmetric proximities between nodes. Specifically, the embeddings obtained by DIVINE lead to up to 10.16% more accurate link prediction, compared to those obtained by state-of-the-art competitors.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123941620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Multi-Scale Variational Graph AutoEncoder for Link Prediction 用于链路预测的多尺度变分图自编码器

Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining Pub Date : 2022-02-11 DOI: 10.1145/3488560.3498531

Zhihao Guo, Feng Wang, Kaixuan Yao, Jiye Liang, Zhiqiang Wang

{"title":"Multi-Scale Variational Graph AutoEncoder for Link Prediction","authors":"Zhihao Guo, Feng Wang, Kaixuan Yao, Jiye Liang, Zhiqiang Wang","doi":"10.1145/3488560.3498531","DOIUrl":"https://doi.org/10.1145/3488560.3498531","url":null,"abstract":"Link prediction has become a significant research problem in deep learning, and the graph-based autoencoder model is one of the most important methods to solve it. The existing graph-based autoencoder models only learn a single set of distributions, which cannot accurately represent the mixed distribution in real graph data. Meanwhile, existing learning models have been greatly restricted when the graph data has insufficient attribute information and inaccurate topology information. In this paper, we propose a novel graph embedding framework, termed multi-scale variational graph autoencoder (MSVGAE), which learns multiple sets of low-dimensional vectors of different dimensions through the graph encoder to represent the mixed probability distribution of the original graph data, and performs multiple sampling in each dimension. Furthermore, a self-supervised learning strategy (i.e., graph feature reconstruction auxiliary learning) is introduced to fully use the graph attribute information to help the graph structure learning. Experiment studies on real-world graphs demonstrate that the proposed model achieves state-of-the-art performance compared with other baseline methods in link prediction tasks. Besides, the robustness analysis shows that the proposed MSVGAE method has obvious advantages in the processes of graph data with insufficient attribute information and inaccurate topology information.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116647935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Search and Discovery in Personal Email Collections 搜索和发现在个人电子邮件收藏

Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining Pub Date : 2022-02-11 DOI: 10.1145/3488560.3501393

Michael Bendersky, Xuanhui Wang, Marc Najork, Donald Metzler

引用次数: 1

Structure Meets Sequences: Predicting Network of Co-evolving Sequences 结构满足序列:预测协同进化序列网络

Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining Pub Date : 2022-02-11 DOI: 10.1145/3488560.3498411

Yaojing Wang, Yuan Yao, F. Xu, Yada Zhu, Hanghang Tong

{"title":"Structure Meets Sequences: Predicting Network of Co-evolving Sequences","authors":"Yaojing Wang, Yuan Yao, F. Xu, Yada Zhu, Hanghang Tong","doi":"10.1145/3488560.3498411","DOIUrl":"https://doi.org/10.1145/3488560.3498411","url":null,"abstract":"Co-evolving sequences are ubiquitous in a variety of applications, where different sequences are often inherently inter-connected with each other. We refer to such sequences, together with their inherent connections modeled as a structured network, as network of co-evolving sequences (NoCES). Typical NoCES applications include road traffic monitoring, company revenue prediction, motion capture, etc. To date, it remains a daunting challenge to accurately model NoCES due to the coupling between network structure and sequences. In this paper, we propose to modeling pname with the aim of simultaneously capturing both the dynamics and the interplay between network structure and sequences. Specifically, we propose a joint learning framework to alternatively update the network representations and sequence representations as the sequences evolve over time. A unique feature of our framework lies in that it can deal with the case when there are co-evolving sequences on both network nodes and edges. Experimental evaluations on four real datasets demonstrate that the proposed approach (1) outperforms the existing competitors in terms of prediction accuracy, and (2) scales linearly w.r.t. the sequence length and the network size.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"210 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122632904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Improving Personalized Search with Dual-Feedback Network 利用双反馈网络改进个性化搜索

Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining Pub Date : 2022-02-11 DOI: 10.1145/3488560.3498447

Chenlong Deng, Yujia Zhou, Zhicheng Dou

{"title":"Improving Personalized Search with Dual-Feedback Network","authors":"Chenlong Deng, Yujia Zhou, Zhicheng Dou","doi":"10.1145/3488560.3498447","DOIUrl":"https://doi.org/10.1145/3488560.3498447","url":null,"abstract":"Personalized search improves the quality of search results by modeling historical user behavior. In recent years, many methods based on deep learning have greatly improved the performance of personalized search. However, most of the existing methods only focus on modeling positive user behavior signals, which leads to incomplete user interest modeling. At the same time, the user's search behavior hides much explicit or implicit feedback information. For example, clicking and staying for a certain period represents implicit positive feedback, and skipping behavior represents implicit negative feedback. Intuitively, this information can be utilized to construct a more complete and accurate user profile. In this paper, we propose a dual-feedback modeling framework, which integrates multi-granular user feedback information to model the user's current search intention. Specifically, we propose a feedback extraction network to refine the dual-feedback representation in multiple stages. For enhancing the user's real-time search quality, we design an additional dual-feedback feature gating module to capture the user's real-time feedback in the current session. We conducted a large number of experiments on two real-world datasets, and the experimental results show that our method can effectively improve the performance of personalized search.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129041107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

ConsistSum: Unsupervised Opinion Summarization with the Consistency of Aspect, Sentiment and Semantic ConsistSum:具有方面、情感和语义一致性的无监督意见摘要

Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining Pub Date : 2022-02-11 DOI: 10.1145/3488560.3498463

Wenjun Ke, Jinhua Gao, Huawei Shen, Xueqi Cheng

{"title":"ConsistSum: Unsupervised Opinion Summarization with the Consistency of Aspect, Sentiment and Semantic","authors":"Wenjun Ke, Jinhua Gao, Huawei Shen, Xueqi Cheng","doi":"10.1145/3488560.3498463","DOIUrl":"https://doi.org/10.1145/3488560.3498463","url":null,"abstract":"Unsupervised opinion summarization techniques are designed to condense the review data and summarize informative and salient opinions in the absence of golden references. Existing dominant methods generally follow a two-stage framework: first creating the synthetic \"review-summary\" paired datasets and then feeding them into the generative summary model for supervised training. However, these methods mainly focus on semantic similarity in synthetic dataset creation, ignoring the consistency of aspects and sentiments in synthetic pairs. Such inconsistency also brings a gap to the training and inference of the summarization model. To alleviate this problem, we propose ConsistSum, an unsupervised opinion summarization method devoting to capture the consistency of aspects and sentiment between reviews and summaries. Specifically, ConsistSum first extracts the preliminary \"review-summary\" pairs from the raw corpus by evaluating the distance of aspect distribution and sentiment distribution. Then, we refine the preliminary summary with the constrained Metropolis-Hastings sampling to produce highly consistent synthetic datasets. In the summarization phase, we adopt the generative model T5 as the summarization model. T5 is fine-tuned for the opinion summarization task by incorporating the loss of predicting aspect and opinion distribution. Experimental results on two benchmark datasets, $i.e.$, Yelp and Amazon, demonstrate the superior performance of ConsistSum over the state-of-the-art baselines.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124637336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Personalized Long-distance Fuel-efficient Route Recommendation Through Historical Trajectories Mining 基于历史轨迹挖掘的个性化长距离节能路线推荐

Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining Pub Date : 2022-02-11 DOI: 10.1145/3488560.3498512

Zhan Wang, Zhaohui Peng, Senzhang Wang, Qiao Song

{"title":"Personalized Long-distance Fuel-efficient Route Recommendation Through Historical Trajectories Mining","authors":"Zhan Wang, Zhaohui Peng, Senzhang Wang, Qiao Song","doi":"10.1145/3488560.3498512","DOIUrl":"https://doi.org/10.1145/3488560.3498512","url":null,"abstract":"Finding fuel-efficient routes for drivers has increasingly important value in terms of saving energy, protecting the environment and saving expenses. Previous studies basically adopt simple fuel consumption calculation or prediction methods to recommend the fuel-efficient routes within a city, which have two major limitations. First, the effect of drivers' driving behavior preferences (e.g. acceleration, frequency of clutch use, etc.) on fuel consumption is not fully studied and utilized. Second, existing methods mainly focus on short-distance route recommendation. Due to the difference in the road network structure and route composition, it is not effective to directly apply the route recommendation methods designed for short-distance travel within a city on the scenario of long-distance travel among cities. In this paper, we propose a novel model PLd-FeRR for the Personalized Long-distance Fuel-efficient Route Recommendation. Specifically, we first identify the features reflecting the user's driving behavior preference based on the user's historical driving trajectory, and then extract the potential factors that can affect long-distance fuel consumption. As transformer can effectively capture the temporal features for long sequence data, the extracted personalized driving preference features and long-distance fuel consumption features are input into a transformer-based fuel consumption prediction model. Next, the prediction model is combined with a genetic algorithm to further improve the performance of recommending fuel-efficient routes. Extensive evaluations are conducted on the large real-world dataset, and the results show the effectiveness of our proposal.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130841408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Obtaining Robust Models from Imbalanced Data 从不平衡数据中获得稳健模型

Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining Pub Date : 2022-02-11 DOI: 10.1145/3488560.3502217

Wentao Wang

{"title":"Obtaining Robust Models from Imbalanced Data","authors":"Wentao Wang","doi":"10.1145/3488560.3502217","DOIUrl":"https://doi.org/10.1145/3488560.3502217","url":null,"abstract":"The vulnerability of deep neural network (DNN) models has been verified by the existence of adversarial examples. By exploiting slight changes to input examples, the generated adversarial examples can easily cause well trained DNN models make wrong predictions. Many defense methods have been proposed to improve the robustness of DNN models against adversarial examples. Among them, adversarial training has been empirically proven to be one of the most effective methods. Almost all existing studies about adversarial training are focused on balanced datasets, where each class has an equal amount of training examples. However, as datasets collected in real-world applications cannot guarantee all contained classes are uniformly distributed, it would be much challenging to obtain robust models in those real applications where the available training datasets are imbalanced. As the initial effort to study this problem, we first investigate the different behaviors between adversarially trained models and naturally trained models using imbalanced training datasets and then explore possible solutions to facilitate adversarial training under imbalanced settings.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"255 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133609727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Practical Guide to Robust Multimodal Machine Learning and Its Application in Education 鲁棒多模态机器学习及其在教育中的应用实践指南

Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining Pub Date : 2022-02-11 DOI: 10.1145/3488560.3510010

Zitao Liu

{"title":"A Practical Guide to Robust Multimodal Machine Learning and Its Application in Education","authors":"Zitao Liu","doi":"10.1145/3488560.3510010","DOIUrl":"https://doi.org/10.1145/3488560.3510010","url":null,"abstract":"Recently we have seen a rapid rise in the amount of education data available through the digitization of education. This huge amount of education data usually exhibits in a mixture form of images, videos, speech, texts, etc. It is crucial to consider data from different modalities to build successful applications in AI in education (AIED). This talk targets AI researchers and practitioners who are interested in applying state-of-the-art multimodal machine learning techniques to tackle some of the hard-core AIED tasks. These include tasks such as automatic short answer grading, student assessment, class quality assurance, knowledge tracing, etc. In this talk, I will share some recent developments of successfully applying multimodal learning approaches in AIED, with a focus on those classroom multimodal data. Beyond introducing the recent advances of computer vision, speech, natural language processing in education respectively, I will discuss how to combine data from different modalities and build AI driven educational applications on top of these data. Participants will learn about recent trends and emerging challenges in this topic, representative tools and learning resources to obtain ready-to-use models, and how related models and techniques benefit real-world AIED applications.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134296687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1