Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining最新文献

筛选
英文 中文
k-Clustering with Fair Outliers 具有公平离群值的k聚类
Matteo Almanza, Alessandro Epasto, A. Panconesi, Giuseppe Re
{"title":"k-Clustering with Fair Outliers","authors":"Matteo Almanza, Alessandro Epasto, A. Panconesi, Giuseppe Re","doi":"10.1145/3488560.3498485","DOIUrl":"https://doi.org/10.1145/3488560.3498485","url":null,"abstract":"Clustering problems and clustering algorithms are often overly sensitive to the presence of outliers: even a handful of points can greatly affect the structure of the optimal solution and its cost. This is why many algorithms for robust clustering problems have been formulated in recent years. These algorithms discard some points as outliers, excluding them from the clustering. However, outlier selection can be unfair: some categories of input points may be disproportionately affected by the outlier removal algorithm. We study the problem of k-clustering with fair outlier removal and provide the first approximation algorithm for well-known clustering formulations, such as k-means and k-median. We analyze this algorithm and prove that it has strong theoretical guarantees. We complement this result with an empirical evaluation showing that, while standard methods for outlier removal have a disproportionate impact across categories of input points, our algorithm equalizes the impact while retaining strong experimental performances on multiple real--world datasets. We also show how the fairness of outlier removal can influence the performance of a downstream learning task. Finally, we provide a coreset construction, which makes our algorithm scalable to very large datasets.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114733599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Directed Network Embedding with Virtual Negative Edges 具有虚拟负边的有向网络嵌入
Hyunsik Yoo, Yeon-Chang Lee, Kijung Shin, Sang-Wook Kim
{"title":"Directed Network Embedding with Virtual Negative Edges","authors":"Hyunsik Yoo, Yeon-Chang Lee, Kijung Shin, Sang-Wook Kim","doi":"10.1145/3488560.3498470","DOIUrl":"https://doi.org/10.1145/3488560.3498470","url":null,"abstract":"The directed network embedding problem is to represent the nodes in a given directed network as embeddings (i.e., low-dimensional vectors) that preserve the asymmetric relationships between nodes. While a number of approaches have been developed for this problem, we point out that existing approaches commonly face difficulties in accurately preserving asymmetric proximities between nodes in a sparse network containing a large number of low out- and in-degree nodes. In this paper, we focus on addressing this intrinsic difficulty caused by the lack of information. We first introduce the concept of virtual negative edges (VNEs), which represent latent negative relationships between nodes. Based on the concept, we propose a novel DIrected NE approach with VIrtual Negative Edges, named as DIVINE. DIVINE carefully decides the number and locations of VNEs to be added to the input network. Once VNEs are added, DIVINE learns embeddings by exploiting both the signs and directions of edges. Our experiments on four real-world directed networks demonstrate that adding VNEs alleviates the lack of information about low-degree nodes, thereby enabling DIVINE to yield high-quality embeddings that accurately capture asymmetric proximities between nodes. Specifically, the embeddings obtained by DIVINE lead to up to 10.16% more accurate link prediction, compared to those obtained by state-of-the-art competitors.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123941620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Multi-Scale Variational Graph AutoEncoder for Link Prediction 用于链路预测的多尺度变分图自编码器
Zhihao Guo, Feng Wang, Kaixuan Yao, Jiye Liang, Zhiqiang Wang
{"title":"Multi-Scale Variational Graph AutoEncoder for Link Prediction","authors":"Zhihao Guo, Feng Wang, Kaixuan Yao, Jiye Liang, Zhiqiang Wang","doi":"10.1145/3488560.3498531","DOIUrl":"https://doi.org/10.1145/3488560.3498531","url":null,"abstract":"Link prediction has become a significant research problem in deep learning, and the graph-based autoencoder model is one of the most important methods to solve it. The existing graph-based autoencoder models only learn a single set of distributions, which cannot accurately represent the mixed distribution in real graph data. Meanwhile, existing learning models have been greatly restricted when the graph data has insufficient attribute information and inaccurate topology information. In this paper, we propose a novel graph embedding framework, termed multi-scale variational graph autoencoder (MSVGAE), which learns multiple sets of low-dimensional vectors of different dimensions through the graph encoder to represent the mixed probability distribution of the original graph data, and performs multiple sampling in each dimension. Furthermore, a self-supervised learning strategy (i.e., graph feature reconstruction auxiliary learning) is introduced to fully use the graph attribute information to help the graph structure learning. Experiment studies on real-world graphs demonstrate that the proposed model achieves state-of-the-art performance compared with other baseline methods in link prediction tasks. Besides, the robustness analysis shows that the proposed MSVGAE method has obvious advantages in the processes of graph data with insufficient attribute information and inaccurate topology information.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116647935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Search and Discovery in Personal Email Collections 搜索和发现在个人电子邮件收藏
Michael Bendersky, Xuanhui Wang, Marc Najork, Donald Metzler
{"title":"Search and Discovery in Personal Email Collections","authors":"Michael Bendersky, Xuanhui Wang, Marc Najork, Donald Metzler","doi":"10.1145/3488560.3501393","DOIUrl":"https://doi.org/10.1145/3488560.3501393","url":null,"abstract":"Email has been an essential communication medium for many years. As a result, the information accumulated in our mailboxes has become valuable for all of our personal and professional activities. For years, researchers have developed interfaces, models, and algorithms to facilitate email search, discovery, and organization. This tutorial brings together these diverse research directions and provides both a historical background, as well as a high-level overview of the recent advances in the field. In particular, we lay out all of the components needed in the design of email search engines, including user interfaces, indexing, document and query understanding, retrieval, ranking, evaluation, and data privacy. The tutorial also goes beyond search, presenting recent work on intelligent task assistance in email and a number of interesting future directions.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116891845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Structure Meets Sequences: Predicting Network of Co-evolving Sequences 结构满足序列:预测协同进化序列网络
Yaojing Wang, Yuan Yao, F. Xu, Yada Zhu, Hanghang Tong
{"title":"Structure Meets Sequences: Predicting Network of Co-evolving Sequences","authors":"Yaojing Wang, Yuan Yao, F. Xu, Yada Zhu, Hanghang Tong","doi":"10.1145/3488560.3498411","DOIUrl":"https://doi.org/10.1145/3488560.3498411","url":null,"abstract":"Co-evolving sequences are ubiquitous in a variety of applications, where different sequences are often inherently inter-connected with each other. We refer to such sequences, together with their inherent connections modeled as a structured network, as network of co-evolving sequences (NoCES). Typical NoCES applications include road traffic monitoring, company revenue prediction, motion capture, etc. To date, it remains a daunting challenge to accurately model NoCES due to the coupling between network structure and sequences. In this paper, we propose to modeling pname with the aim of simultaneously capturing both the dynamics and the interplay between network structure and sequences. Specifically, we propose a joint learning framework to alternatively update the network representations and sequence representations as the sequences evolve over time. A unique feature of our framework lies in that it can deal with the case when there are co-evolving sequences on both network nodes and edges. Experimental evaluations on four real datasets demonstrate that the proposed approach (1) outperforms the existing competitors in terms of prediction accuracy, and (2) scales linearly w.r.t. the sequence length and the network size.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"210 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122632904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Improving Personalized Search with Dual-Feedback Network 利用双反馈网络改进个性化搜索
Chenlong Deng, Yujia Zhou, Zhicheng Dou
{"title":"Improving Personalized Search with Dual-Feedback Network","authors":"Chenlong Deng, Yujia Zhou, Zhicheng Dou","doi":"10.1145/3488560.3498447","DOIUrl":"https://doi.org/10.1145/3488560.3498447","url":null,"abstract":"Personalized search improves the quality of search results by modeling historical user behavior. In recent years, many methods based on deep learning have greatly improved the performance of personalized search. However, most of the existing methods only focus on modeling positive user behavior signals, which leads to incomplete user interest modeling. At the same time, the user's search behavior hides much explicit or implicit feedback information. For example, clicking and staying for a certain period represents implicit positive feedback, and skipping behavior represents implicit negative feedback. Intuitively, this information can be utilized to construct a more complete and accurate user profile. In this paper, we propose a dual-feedback modeling framework, which integrates multi-granular user feedback information to model the user's current search intention. Specifically, we propose a feedback extraction network to refine the dual-feedback representation in multiple stages. For enhancing the user's real-time search quality, we design an additional dual-feedback feature gating module to capture the user's real-time feedback in the current session. We conducted a large number of experiments on two real-world datasets, and the experimental results show that our method can effectively improve the performance of personalized search.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129041107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
ConsistSum: Unsupervised Opinion Summarization with the Consistency of Aspect, Sentiment and Semantic ConsistSum:具有方面、情感和语义一致性的无监督意见摘要
Wenjun Ke, Jinhua Gao, Huawei Shen, Xueqi Cheng
{"title":"ConsistSum: Unsupervised Opinion Summarization with the Consistency of Aspect, Sentiment and Semantic","authors":"Wenjun Ke, Jinhua Gao, Huawei Shen, Xueqi Cheng","doi":"10.1145/3488560.3498463","DOIUrl":"https://doi.org/10.1145/3488560.3498463","url":null,"abstract":"Unsupervised opinion summarization techniques are designed to condense the review data and summarize informative and salient opinions in the absence of golden references. Existing dominant methods generally follow a two-stage framework: first creating the synthetic \"review-summary\" paired datasets and then feeding them into the generative summary model for supervised training. However, these methods mainly focus on semantic similarity in synthetic dataset creation, ignoring the consistency of aspects and sentiments in synthetic pairs. Such inconsistency also brings a gap to the training and inference of the summarization model. To alleviate this problem, we propose ConsistSum, an unsupervised opinion summarization method devoting to capture the consistency of aspects and sentiment between reviews and summaries. Specifically, ConsistSum first extracts the preliminary \"review-summary\" pairs from the raw corpus by evaluating the distance of aspect distribution and sentiment distribution. Then, we refine the preliminary summary with the constrained Metropolis-Hastings sampling to produce highly consistent synthetic datasets. In the summarization phase, we adopt the generative model T5 as the summarization model. T5 is fine-tuned for the opinion summarization task by incorporating the loss of predicting aspect and opinion distribution. Experimental results on two benchmark datasets, $i.e.$, Yelp and Amazon, demonstrate the superior performance of ConsistSum over the state-of-the-art baselines.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124637336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Personalized Long-distance Fuel-efficient Route Recommendation Through Historical Trajectories Mining 基于历史轨迹挖掘的个性化长距离节能路线推荐
Zhan Wang, Zhaohui Peng, Senzhang Wang, Qiao Song
{"title":"Personalized Long-distance Fuel-efficient Route Recommendation Through Historical Trajectories Mining","authors":"Zhan Wang, Zhaohui Peng, Senzhang Wang, Qiao Song","doi":"10.1145/3488560.3498512","DOIUrl":"https://doi.org/10.1145/3488560.3498512","url":null,"abstract":"Finding fuel-efficient routes for drivers has increasingly important value in terms of saving energy, protecting the environment and saving expenses. Previous studies basically adopt simple fuel consumption calculation or prediction methods to recommend the fuel-efficient routes within a city, which have two major limitations. First, the effect of drivers' driving behavior preferences (e.g. acceleration, frequency of clutch use, etc.) on fuel consumption is not fully studied and utilized. Second, existing methods mainly focus on short-distance route recommendation. Due to the difference in the road network structure and route composition, it is not effective to directly apply the route recommendation methods designed for short-distance travel within a city on the scenario of long-distance travel among cities. In this paper, we propose a novel model PLd-FeRR for the Personalized Long-distance Fuel-efficient Route Recommendation. Specifically, we first identify the features reflecting the user's driving behavior preference based on the user's historical driving trajectory, and then extract the potential factors that can affect long-distance fuel consumption. As transformer can effectively capture the temporal features for long sequence data, the extracted personalized driving preference features and long-distance fuel consumption features are input into a transformer-based fuel consumption prediction model. Next, the prediction model is combined with a genetic algorithm to further improve the performance of recommending fuel-efficient routes. Extensive evaluations are conducted on the large real-world dataset, and the results show the effectiveness of our proposal.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130841408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Obtaining Robust Models from Imbalanced Data 从不平衡数据中获得稳健模型
Wentao Wang
{"title":"Obtaining Robust Models from Imbalanced Data","authors":"Wentao Wang","doi":"10.1145/3488560.3502217","DOIUrl":"https://doi.org/10.1145/3488560.3502217","url":null,"abstract":"The vulnerability of deep neural network (DNN) models has been verified by the existence of adversarial examples. By exploiting slight changes to input examples, the generated adversarial examples can easily cause well trained DNN models make wrong predictions. Many defense methods have been proposed to improve the robustness of DNN models against adversarial examples. Among them, adversarial training has been empirically proven to be one of the most effective methods. Almost all existing studies about adversarial training are focused on balanced datasets, where each class has an equal amount of training examples. However, as datasets collected in real-world applications cannot guarantee all contained classes are uniformly distributed, it would be much challenging to obtain robust models in those real applications where the available training datasets are imbalanced. As the initial effort to study this problem, we first investigate the different behaviors between adversarially trained models and naturally trained models using imbalanced training datasets and then explore possible solutions to facilitate adversarial training under imbalanced settings.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"255 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133609727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Practical Guide to Robust Multimodal Machine Learning and Its Application in Education 鲁棒多模态机器学习及其在教育中的应用实践指南
Zitao Liu
{"title":"A Practical Guide to Robust Multimodal Machine Learning and Its Application in Education","authors":"Zitao Liu","doi":"10.1145/3488560.3510010","DOIUrl":"https://doi.org/10.1145/3488560.3510010","url":null,"abstract":"Recently we have seen a rapid rise in the amount of education data available through the digitization of education. This huge amount of education data usually exhibits in a mixture form of images, videos, speech, texts, etc. It is crucial to consider data from different modalities to build successful applications in AI in education (AIED). This talk targets AI researchers and practitioners who are interested in applying state-of-the-art multimodal machine learning techniques to tackle some of the hard-core AIED tasks. These include tasks such as automatic short answer grading, student assessment, class quality assurance, knowledge tracing, etc. In this talk, I will share some recent developments of successfully applying multimodal learning approaches in AIED, with a focus on those classroom multimodal data. Beyond introducing the recent advances of computer vision, speech, natural language processing in education respectively, I will discuss how to combine data from different modalities and build AI driven educational applications on top of these data. Participants will learn about recent trends and emerging challenges in this topic, representative tools and learning resources to obtain ready-to-use models, and how related models and techniques benefit real-world AIED applications.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134296687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信