Proceedings of the Tenth ACM International Conference on Web Search and Data Mining最新文献

筛选
英文 中文
Motifs in Temporal Networks 时间网络中的母题
Ashwin Paranjape, Austin R. Benson, J. Leskovec
{"title":"Motifs in Temporal Networks","authors":"Ashwin Paranjape, Austin R. Benson, J. Leskovec","doi":"10.1145/3018661.3018731","DOIUrl":"https://doi.org/10.1145/3018661.3018731","url":null,"abstract":"Networks are a fundamental tool for modeling complex systems in a variety of domains including social and communication networks as well as biology and neuroscience. The counts of small subgraph patterns in networks, called network motifs, are crucial to understanding the structure and function of these systems. However, the role of network motifs for temporal networks, which contain many timestamped links between nodes, is not well understood. Here we develop a notion of a temporal network motif as an elementary unit of temporal networks and provide a general methodology for counting such motifs. We define temporal network motifs as induced subgraphs on sequences of edges, design several fast algorithms for counting temporal network motifs, and prove their runtime complexity. We also show that our fast algorithms achieve 1.3x to 56.5x speedups compared to a baseline method. We use our algorithms to count temporal network motifs in a variety of real-world datasets. Results show that networks from different domains have significantly different motif frequencies, whereas networks from the same domain tend to have similar motif frequencies. We also find that measuring motif counts at various time scales reveals different behavior.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126275144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 560
Predicting Completeness in Knowledge Bases 预测知识库的完整性
Luis Galárraga, Simon Razniewski, Antoine Amarilli, Fabian M. Suchanek
{"title":"Predicting Completeness in Knowledge Bases","authors":"Luis Galárraga, Simon Razniewski, Antoine Amarilli, Fabian M. Suchanek","doi":"10.1145/3018661.3018739","DOIUrl":"https://doi.org/10.1145/3018661.3018739","url":null,"abstract":"Knowledge bases such as Wikidata, DBpedia, or YAGO contain millions of entities and facts. In some knowledge bases, the correctness of these facts has been evaluated. However, much less is known about their completeness, i.e., the proportion of real facts that the knowledge bases cover. In this work, we investigate different signals to identify the areas where a knowledge base is complete. We show that we can combine these signals in a rule mining approach, which allows us to predict where facts may be missing. We also show that completeness predictions can help other applications such as fact prediction.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126569170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 104
Uncovering the Dynamics of Crowdlearning and the Value of Knowledge 揭示大众学习的动态和知识的价值
U. Upadhyay, I. Valera, M. Gomez-Rodriguez
{"title":"Uncovering the Dynamics of Crowdlearning and the Value of Knowledge","authors":"U. Upadhyay, I. Valera, M. Gomez-Rodriguez","doi":"10.1145/3018661.3018685","DOIUrl":"https://doi.org/10.1145/3018661.3018685","url":null,"abstract":"Learning from the crowd has become increasingly popular in the Web and social media. There is a wide variety of crowdlearning sites in which, on the one hand, users learn from the knowledge that other users contribute to the site, and, on the other hand, knowledge is reviewed and curated by the same users using assessment measures such as upvotes or likes. In this paper, we present a probabilistic modeling framework of crowdlearning, which uncovers the evolution of a user's expertise over time by leveraging other users' assessments of her contributions. The model allows for both off-site and on-site learning and captures forgetting of knowledge. We then develop a scalable estimation method to fit the model parameters from millions of recorded learning and contributing events. We show the effectiveness of our model by tracing activity of ~25 thousand users in Stack Overflow over a 4.5 year period. We find that answers with high knowledge value are rare. Newbies and experts tend to acquire less knowledge than users in the middle range. Prolific learners tend to be also proficient contributors that post answers with high knowledge value.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115015547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Fun Facts: Automatic Trivia Fact Extraction from Wikipedia 有趣的事实:自动琐事事实提取从维基百科
David Tsurel, D. Pelleg, Ido Guy, Dafna Shahaf
{"title":"Fun Facts: Automatic Trivia Fact Extraction from Wikipedia","authors":"David Tsurel, D. Pelleg, Ido Guy, Dafna Shahaf","doi":"10.1145/3018661.3018709","DOIUrl":"https://doi.org/10.1145/3018661.3018709","url":null,"abstract":"A significant portion of web search queries directly refers to named entities. Search engines explore various ways to improve the user experience for such queries. We suggest augmenting search results with trivia facts about the searched entity. Trivia is widely played throughout the world, and was shown to increase users' engagement and retention. Most random facts are not suitable for the trivia section. There is skill (and art) to curating good trivia. In this paper, we formalize a notion of trivia-worthiness and propose an algorithm that automatically mines trivia facts from Wikipedia. We take advantage of Wikipedia's category structure, and rank an entity's categories by their trivia-quality. Our algorithm is capable of finding interesting facts, such as Obama's Grammy or Elvis' stint as a tank gunner. In user studies, our algorithm captures the intuitive notion of \"good trivia\" 45% higher than prior work. Search-page tests show a 22% decrease in bounce rates and a 12% increase in dwell time, proving our facts hold users' attention.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114530237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification 基于任务导向和路径增强的异构网络嵌入作者识别
Ting Chen, Yizhou Sun
{"title":"Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification","authors":"Ting Chen, Yizhou Sun","doi":"10.1145/3018661.3018735","DOIUrl":"https://doi.org/10.1145/3018661.3018735","url":null,"abstract":"In this paper, we study the problem of author identification under double-blind review setting, which is to identify potential authors given information of an anonymized paper. Different from existing approaches that rely heavily on feature engineering, we propose to use network embedding approach to address the problem, which can automatically represent nodes into lower dimensional feature vectors. However, there are two major limitations in recent studies on network embedding: (1) they are usually general-purpose embedding methods, which are independent of the specific tasks; and (2) most of these approaches can only deal with homogeneous networks, where the heterogeneity of the network is ignored. Hence, challenges faced here are two folds: (1) how to embed the network under the guidance of the author identification task, and (2) how to select the best type of information due to the heterogeneity of the network. To address the challenges, we propose a task-guided and path-augmented heterogeneous network embedding model. In our model, nodes are first embedded as vectors in latent feature space. Embeddings are then shared and jointly trained according to task-specific and network-general objectives. We extend the existing unsupervised network embedding to incorporate meta paths in heterogeneous networks, and select paths according to the specific task. The guidance from author identification task for network embedding is provided both explicitly in joint training and implicitly during meta path selection. Our experiments demonstrate that by using path-augmented network embedding with task guidance, our model can obtain significantly better accuracy at identifying the true authors comparing to existing methods.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115091899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 214
RedQueen: An Online Algorithm for Smart Broadcasting in Social Networks RedQueen:社交网络中智能广播的在线算法
Ali Zarezade, U. Upadhyay, H. Rabiee, M. Gomez-Rodriguez
{"title":"RedQueen: An Online Algorithm for Smart Broadcasting in Social Networks","authors":"Ali Zarezade, U. Upadhyay, H. Rabiee, M. Gomez-Rodriguez","doi":"10.1145/3018661.3018684","DOIUrl":"https://doi.org/10.1145/3018661.3018684","url":null,"abstract":"Users in social networks whose posts stay at the top of their followers' feeds the longest time are more likely to be noticed. Can we design an online algorithm to help them decide when to post to stay at the top? In this paper, we address this question as a novel optimal control problem for jump stochastic differential equations. For a wide variety of feed dynamics, we show that the optimal broadcasting intensity for any user is surprisingly simple ? it is given by the position of her most recent post on each of her follower's feeds. As a consequence, we are able to develop a simple and highly efficient online algorithm, RedQueen, to sample the optimal times for the user to post. Experiments on both synthetic and real data gathered from Twitter show that our algorithm is able to consistently make a user's posts more visible over time, is robust to volume changes on her followers' feeds, and significantly outperforms the state of the art.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130140399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification DiSMEC:极端多标签分类的分布式稀疏机
Rohit Babbar, B. Scholkopf
{"title":"DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification","authors":"Rohit Babbar, B. Scholkopf","doi":"10.1145/3018661.3018741","DOIUrl":"https://doi.org/10.1145/3018661.3018741","url":null,"abstract":"Extreme multi-label classification refers to supervised multi-label learning involving hundreds of thousands or even millions of labels. Datasets in extreme classification exhibit fit to power-law distribution, i.e. a large fraction of labels have very few positive instances in the data distribution. Most state-of-the-art approaches for extreme multi-label classification attempt to capture correlation among labels by embedding the label matrix to a low-dimensional linear sub-space. However, in the presence of power-law distributed extremely large and diverse label spaces, structural assumptions such as low rank can be easily violated. In this work, we present DiSMEC, which is a large-scale distributed framework for learning one-versus-rest linear classifiers coupled with explicit capacity control to control model size. Unlike most state-of-the-art methods, DiSMEC does not make any low rank assumptions on the label matrix. Using double layer of parallelization, DiSMEC can learn classifiers for datasets consisting hundreds of thousands labels within few hours. The explicit capacity control mechanism filters out spurious parameters which keep the model compact in size, without losing prediction accuracy. We conduct extensive empirical evaluation on publicly available real-world datasets consisting upto 670,000 labels. We compare DiSMEC with recent state-of-the-art approaches, including - SLEEC which is a leading approach for learning sparse local embeddings, and FastXML which is a tree-based approach optimizing ranking based loss function. On some of the datasets, DiSMEC can significantly boost prediction accuracies - 10% better compared to SLECC and 15% better compared to FastXML, in absolute terms.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126625443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 219
Unbiased Learning-to-Rank with Biased Feedback 无偏学习排序与有偏反馈
T. Joachims, Adith Swaminathan, Tobias Schnabel
{"title":"Unbiased Learning-to-Rank with Biased Feedback","authors":"T. Joachims, Adith Swaminathan, Tobias Schnabel","doi":"10.1145/3018661.3018699","DOIUrl":"https://doi.org/10.1145/3018661.3018699","url":null,"abstract":"Implicit feedback (e.g., clicks, dwell times, etc.) is an abundant source of data in human-interactive systems. While implicit feedback has many advantages (e.g., it is inexpensive to collect, user centric, and timely), its inherent biases are a key obstacle to its effective use. For example, position bias in search rankings strongly influences how many clicks a result receives, so that directly using click data as a training signal in Learning-to-Rank (LTR) methods yields sub-optimal results. To overcome this bias problem, we present a counterfactual inference framework that provides the theoretical basis for unbiased LTR via Empirical Risk Minimization despite biased data. Using this framework, we derive a Propensity-Weighted Ranking SVM for discriminative learning from implicit feedback, where click models take the role of the propensity estimator. In contrast to most conventional approaches to de-biasing the data using click models, this allows training of ranking functions even in settings where queries do not repeat. Beyond the theoretical support, we show empirically that the proposed learning method is highly effective in dealing with biases, that it is robust to noise and propensity model misspecification, and that it scales efficiently. We also demonstrate the real-world applicability of our approach on an operational search engine, where it substantially improves retrieval performance.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126046920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 447
How Smart Does Your Profile Image Look?: Estimating Intelligence from Social Network Profile Images 你的头像看起来有多聪明?:从社交网络个人资料图像中估计智能
Xingjie Wei, D. Stillwell
{"title":"How Smart Does Your Profile Image Look?: Estimating Intelligence from Social Network Profile Images","authors":"Xingjie Wei, D. Stillwell","doi":"10.1145/3018661.3018663","DOIUrl":"https://doi.org/10.1145/3018661.3018663","url":null,"abstract":"Profile images on social networks are users' opportunity to present themselves and to affect how others judge them. We examine what Facebook images say about users' perceived and measured intelligence. 1,122 Facebook users completed a matrices intelligence test and shared their current Facebook profile image. Strangers also rated the images for perceived intelligence. We use automatically extracted image features to predict both measured and perceived intelligence. Intelligence estimation from images is a difficult task even for humans, but experimental results show that human accuracy can be equalled using computing methods. We report the image features that predict both measured and perceived intelligence, and highlight misleading features such as \"smiling'' and \"wearing glasses'' that are correlated with perceived but not measured intelligence. Our results give insights into inaccurate stereotyping from profile images and also have implications for privacy, especially since in most social networks profile images are public by default.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130788051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Comparative Document Analysis for Large Text Corpora 大文本语料库的比较文献分析
Xiang Ren, Yuanhua Lv, Kuansan Wang, Jiawei Han
{"title":"Comparative Document Analysis for Large Text Corpora","authors":"Xiang Ren, Yuanhua Lv, Kuansan Wang, Jiawei Han","doi":"10.1145/3018661.3018690","DOIUrl":"https://doi.org/10.1145/3018661.3018690","url":null,"abstract":"This paper presents a novel research problem, Comparative Document Analysis (CDA), that is, joint discovery of commonalities and differences between two individual documents (or two sets of documents) in a large text corpus. Given any pair of documents from a (background) document collection, CDA aims to automatically identify sets of quality phrases to summarize the commonalities of both documents and highlight the distinctions of each with respect to the other informatively and concisely. Our solution uses a general graph-based framework to derive novel measures on phrase semantic commonality and pairwise distinction, where the background corpus is used for computing phrase-document semantic relevance. We use the measures to guide the selection of sets of phrases by solving two joint optimization problems. A scalable iterative algorithm is developed to integrate the maximization of phrase commonality or distinction measure with the learning of phrase-document semantic relevance. Experiments on large text corpora from two different domains---scientific papers and news---demonstrate the effectiveness and robustness of the proposed framework on comparing documents. Analysis on a 10GB+ text corpus demonstrates the scalability of our method, whose computation time grows linearly as the corpus size increases. Our case study on comparing news articles published at different dates shows the power of the proposed method on comparing sets of documents.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124489739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信