Proceedings of the Tenth ACM International Conference on Web Search and Data Mining最新文献_第10页

Motifs in Temporal Networks 时间网络中的母题

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2016-12-29 DOI: 10.1145/3018661.3018731

Ashwin Paranjape, Austin R. Benson, J. Leskovec

{"title":"Motifs in Temporal Networks","authors":"Ashwin Paranjape, Austin R. Benson, J. Leskovec","doi":"10.1145/3018661.3018731","DOIUrl":"https://doi.org/10.1145/3018661.3018731","url":null,"abstract":"Networks are a fundamental tool for modeling complex systems in a variety of domains including social and communication networks as well as biology and neuroscience. The counts of small subgraph patterns in networks, called network motifs, are crucial to understanding the structure and function of these systems. However, the role of network motifs for temporal networks, which contain many timestamped links between nodes, is not well understood. Here we develop a notion of a temporal network motif as an elementary unit of temporal networks and provide a general methodology for counting such motifs. We define temporal network motifs as induced subgraphs on sequences of edges, design several fast algorithms for counting temporal network motifs, and prove their runtime complexity. We also show that our fast algorithms achieve 1.3x to 56.5x speedups compared to a baseline method. We use our algorithms to count temporal network motifs in a variety of real-world datasets. Results show that networks from different domains have significantly different motif frequencies, whereas networks from the same domain tend to have similar motif frequencies. We also find that measuring motif counts at various time scales reveals different behavior.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126275144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 560

Predicting Completeness in Knowledge Bases 预测知识库的完整性

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2016-12-17 DOI: 10.1145/3018661.3018739

Luis Galárraga, Simon Razniewski, Antoine Amarilli, Fabian M. Suchanek

引用次数: 104

Uncovering the Dynamics of Crowdlearning and the Value of Knowledge 揭示大众学习的动态和知识的价值

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2016-12-14 DOI: 10.1145/3018661.3018685

U. Upadhyay, I. Valera, M. Gomez-Rodriguez

{"title":"Uncovering the Dynamics of Crowdlearning and the Value of Knowledge","authors":"U. Upadhyay, I. Valera, M. Gomez-Rodriguez","doi":"10.1145/3018661.3018685","DOIUrl":"https://doi.org/10.1145/3018661.3018685","url":null,"abstract":"Learning from the crowd has become increasingly popular in the Web and social media. There is a wide variety of crowdlearning sites in which, on the one hand, users learn from the knowledge that other users contribute to the site, and, on the other hand, knowledge is reviewed and curated by the same users using assessment measures such as upvotes or likes. In this paper, we present a probabilistic modeling framework of crowdlearning, which uncovers the evolution of a user's expertise over time by leveraging other users' assessments of her contributions. The model allows for both off-site and on-site learning and captures forgetting of knowledge. We then develop a scalable estimation method to fit the model parameters from millions of recorded learning and contributing events. We show the effectiveness of our model by tracing activity of ~25 thousand users in Stack Overflow over a 4.5 year period. We find that answers with high knowledge value are rare. Newbies and experts tend to acquire less knowledge than users in the middle range. Prolific learners tend to be also proficient contributors that post answers with high knowledge value.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115015547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Fun Facts: Automatic Trivia Fact Extraction from Wikipedia 有趣的事实:自动琐事事实提取从维基百科

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2016-12-12 DOI: 10.1145/3018661.3018709

David Tsurel, D. Pelleg, Ido Guy, Dafna Shahaf

引用次数: 24

Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification 基于任务导向和路径增强的异构网络嵌入作者识别

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2016-12-08 DOI: 10.1145/3018661.3018735

Ting Chen, Yizhou Sun

{"title":"Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification","authors":"Ting Chen, Yizhou Sun","doi":"10.1145/3018661.3018735","DOIUrl":"https://doi.org/10.1145/3018661.3018735","url":null,"abstract":"In this paper, we study the problem of author identification under double-blind review setting, which is to identify potential authors given information of an anonymized paper. Different from existing approaches that rely heavily on feature engineering, we propose to use network embedding approach to address the problem, which can automatically represent nodes into lower dimensional feature vectors. However, there are two major limitations in recent studies on network embedding: (1) they are usually general-purpose embedding methods, which are independent of the specific tasks; and (2) most of these approaches can only deal with homogeneous networks, where the heterogeneity of the network is ignored. Hence, challenges faced here are two folds: (1) how to embed the network under the guidance of the author identification task, and (2) how to select the best type of information due to the heterogeneity of the network. To address the challenges, we propose a task-guided and path-augmented heterogeneous network embedding model. In our model, nodes are first embedded as vectors in latent feature space. Embeddings are then shared and jointly trained according to task-specific and network-general objectives. We extend the existing unsupervised network embedding to incorporate meta paths in heterogeneous networks, and select paths according to the specific task. The guidance from author identification task for network embedding is provided both explicitly in joint training and implicitly during meta path selection. Our experiments demonstrate that by using path-augmented network embedding with task guidance, our model can obtain significantly better accuracy at identifying the true authors comparing to existing methods.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115091899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 214

RedQueen: An Online Algorithm for Smart Broadcasting in Social Networks RedQueen:社交网络中智能广播的在线算法

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2016-10-18 DOI: 10.1145/3018661.3018684

Ali Zarezade, U. Upadhyay, H. Rabiee, M. Gomez-Rodriguez

引用次数: 49

DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification DiSMEC:极端多标签分类的分布式稀疏机

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2016-09-08 DOI: 10.1145/3018661.3018741

Rohit Babbar, B. Scholkopf

{"title":"DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification","authors":"Rohit Babbar, B. Scholkopf","doi":"10.1145/3018661.3018741","DOIUrl":"https://doi.org/10.1145/3018661.3018741","url":null,"abstract":"Extreme multi-label classification refers to supervised multi-label learning involving hundreds of thousands or even millions of labels. Datasets in extreme classification exhibit fit to power-law distribution, i.e. a large fraction of labels have very few positive instances in the data distribution. Most state-of-the-art approaches for extreme multi-label classification attempt to capture correlation among labels by embedding the label matrix to a low-dimensional linear sub-space. However, in the presence of power-law distributed extremely large and diverse label spaces, structural assumptions such as low rank can be easily violated. In this work, we present DiSMEC, which is a large-scale distributed framework for learning one-versus-rest linear classifiers coupled with explicit capacity control to control model size. Unlike most state-of-the-art methods, DiSMEC does not make any low rank assumptions on the label matrix. Using double layer of parallelization, DiSMEC can learn classifiers for datasets consisting hundreds of thousands labels within few hours. The explicit capacity control mechanism filters out spurious parameters which keep the model compact in size, without losing prediction accuracy. We conduct extensive empirical evaluation on publicly available real-world datasets consisting upto 670,000 labels. We compare DiSMEC with recent state-of-the-art approaches, including - SLEEC which is a leading approach for learning sparse local embeddings, and FastXML which is a tree-based approach optimizing ranking based loss function. On some of the datasets, DiSMEC can significantly boost prediction accuracies - 10% better compared to SLECC and 15% better compared to FastXML, in absolute terms.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126625443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 219

Unbiased Learning-to-Rank with Biased Feedback 无偏学习排序与有偏反馈

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2016-08-16 DOI: 10.1145/3018661.3018699

T. Joachims, Adith Swaminathan, Tobias Schnabel

{"title":"Unbiased Learning-to-Rank with Biased Feedback","authors":"T. Joachims, Adith Swaminathan, Tobias Schnabel","doi":"10.1145/3018661.3018699","DOIUrl":"https://doi.org/10.1145/3018661.3018699","url":null,"abstract":"Implicit feedback (e.g., clicks, dwell times, etc.) is an abundant source of data in human-interactive systems. While implicit feedback has many advantages (e.g., it is inexpensive to collect, user centric, and timely), its inherent biases are a key obstacle to its effective use. For example, position bias in search rankings strongly influences how many clicks a result receives, so that directly using click data as a training signal in Learning-to-Rank (LTR) methods yields sub-optimal results. To overcome this bias problem, we present a counterfactual inference framework that provides the theoretical basis for unbiased LTR via Empirical Risk Minimization despite biased data. Using this framework, we derive a Propensity-Weighted Ranking SVM for discriminative learning from implicit feedback, where click models take the role of the propensity estimator. In contrast to most conventional approaches to de-biasing the data using click models, this allows training of ranking functions even in settings where queries do not repeat. Beyond the theoretical support, we show empirically that the proposed learning method is highly effective in dealing with biases, that it is robust to noise and propensity model misspecification, and that it scales efficiently. We also demonstrate the real-world applicability of our approach on an operational search engine, where it substantially improves retrieval performance.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126046920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 447

How Smart Does Your Profile Image Look?: Estimating Intelligence from Social Network Profile Images 你的头像看起来有多聪明?:从社交网络个人资料图像中估计智能

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2016-06-29 DOI: 10.1145/3018661.3018663

Xingjie Wei, D. Stillwell

引用次数: 24

Comparative Document Analysis for Large Text Corpora 大文本语料库的比较文献分析

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2015-10-25 DOI: 10.1145/3018661.3018690

Xiang Ren, Yuanhua Lv, Kuansan Wang, Jiawei Han

{"title":"Comparative Document Analysis for Large Text Corpora","authors":"Xiang Ren, Yuanhua Lv, Kuansan Wang, Jiawei Han","doi":"10.1145/3018661.3018690","DOIUrl":"https://doi.org/10.1145/3018661.3018690","url":null,"abstract":"This paper presents a novel research problem, Comparative Document Analysis (CDA), that is, joint discovery of commonalities and differences between two individual documents (or two sets of documents) in a large text corpus. Given any pair of documents from a (background) document collection, CDA aims to automatically identify sets of quality phrases to summarize the commonalities of both documents and highlight the distinctions of each with respect to the other informatively and concisely. Our solution uses a general graph-based framework to derive novel measures on phrase semantic commonality and pairwise distinction, where the background corpus is used for computing phrase-document semantic relevance. We use the measures to guide the selection of sets of phrases by solving two joint optimization problems. A scalable iterative algorithm is developed to integrate the maximization of phrase commonality or distinction measure with the learning of phrase-document semantic relevance. Experiments on large text corpora from two different domains---scientific papers and news---demonstrate the effectiveness and robustness of the proposed framework on comparing documents. Analysis on a 10GB+ text corpus demonstrates the scalability of our method, whose computation time grows linearly as the corpus size increases. Our case study on comparing news articles published at different dates shows the power of the proposed method on comparing sets of documents.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124489739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20