Proceedings of the Tenth ACM International Conference on Web Search and Data Mining最新文献_第3页

Link Prediction with Cardinality Constraint 基于基数约束的链路预测

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018734

Jiawei Zhang, Jianhui Chen, Junxing Zhu, Yi Chang, Philip S. Yu

{"title":"Link Prediction with Cardinality Constraint","authors":"Jiawei Zhang, Jianhui Chen, Junxing Zhu, Yi Chang, Philip S. Yu","doi":"10.1145/3018661.3018734","DOIUrl":"https://doi.org/10.1145/3018661.3018734","url":null,"abstract":"Inferring the links among entities in networks is an important research problem for various disciplines. Depending on the specific application settings, the links to be inferred are usually subject to different cardinality constraints, like one-to-one, one-to-many and many-to-many. However, most existing research works on link prediction problems fail to consider such a kind of constraint. In this paper, we propose to study the link prediction problem with general cardinality constraints, which is formally defined as the CLP (Cardinality Constrained Link Prediction) problem. By minimizing the projection loss of links from feature vectors to labels, the CLP problem is formulated as an optimization problem involving multiple variables, where the cardinality constraints are modeled as mathematical constraints on node degrees. The objective function is shown to be not jointly convex and the optimal solution subject to the cardinality constraints can be very time-consuming to achieve. To solve the optimization problem, an iterative variable updating based link prediction framework ITERCLIPS (Iterative Constrained Link Prediction & Selection) is introduced in this paper, which involves the steps on link updating and selection alternatively. To overcome the high time cost problem, a greedy link selection step is introduced in this paper, which picks links greedily while preserving the link cardinality constraints simultaneously. Meanwhile, to ensure the effectiveness of ITERCLIPS on large-scale networks, a distributed implementation of ITERCLIPS is further presented as a scalable solution to the CLP problem. Extensive experiments have been done on three real-world network datasets with different types of cardinality constraints, and the experimental results achieved by ITERCLIPS on all these datasets can demonstrate the effectiveness and advantages of ITERCLIPS in solving the CLP problem.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114891657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

S-HOT: Scalable High-Order Tucker Decomposition S-HOT:可扩展的高阶塔克分解

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018721

Jinoh Oh, Kijung Shin, E. Papalexakis, C. Faloutsos, Hwanjo Yu

{"title":"S-HOT: Scalable High-Order Tucker Decomposition","authors":"Jinoh Oh, Kijung Shin, E. Papalexakis, C. Faloutsos, Hwanjo Yu","doi":"10.1145/3018661.3018721","DOIUrl":"https://doi.org/10.1145/3018661.3018721","url":null,"abstract":"Multi-aspect data appear frequently in many web-related applications. For example, product reviews are quadruplets of (user, product, keyword, timestamp). How can we analyze such web-scale multi-aspect data? Can we analyze them on an off-the-shelf workstation with limited amount of memory? Tucker decomposition has been widely used for discovering patterns in relationships among entities in multi-aspect data, naturally expressed as high-order tensors. However, existing algorithms for Tucker decomposition have limited scalability, and especially, fail to decompose high-order tensors since they explicitly materialize intermediate data, whose size rapidly grows as the order increases (≥ 4). We call this problem M-Bottleneck (\"Materialization Bottleneck\"). To avoid M-Bottleneck, we propose S-HOT, a scalable high-order tucker decomposition method that employs the on-the-fly computation to minimize the materialized intermediate data. Moreover, S-HOT is designed for handling disk-resident tensors, too large to fit in memory, without loading them all in memory at once. We provide theoretical analysis on the amount of memory space and the number of scans of data required by S-HOT. In our experiments, S-HOT showed better scalability not only with the order but also with the dimensionality and the rank than baseline methods. In particular, S-HOT decomposed tensors 1000× larger than baseline methods in terms dimensionality. S- HOT also successfully analyzed real-world tensors that are both large-scale and high-order on an off-the-shelf workstation with limited amount of memory, while baseline methods failed. The source code of S-HOT is publicly available at http://dm.postech.ac.kr/shot to encourage reproducibility.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131898076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 45

Random Semantic Tensor Ensemble for Scalable Knowledge Graph Link Prediction 面向可扩展知识图链接预测的随机语义张量集成

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018695

Yi Tay, Anh Tuan Luu, S. Hui, Falk Brauer

{"title":"Random Semantic Tensor Ensemble for Scalable Knowledge Graph Link Prediction","authors":"Yi Tay, Anh Tuan Luu, S. Hui, Falk Brauer","doi":"10.1145/3018661.3018695","DOIUrl":"https://doi.org/10.1145/3018661.3018695","url":null,"abstract":"Link prediction on knowledge graphs is useful in numerous application areas such as semantic search, question answering, entity disambiguation, enterprise decision support, recommender systems and so on. While many of these applications require a reasonably quick response and may operate on data that is constantly changing, existing methods often lack speed and adaptability to cope with these requirements. This is aggravated by the fact that knowledge graphs are often extremely large and may easily contain millions of entities rendering many of these methods impractical. In this paper, we address the weaknesses of current methods by proposing Random Semantic Tensor Ensemble (RSTE), a scalable ensemble-enabled framework based on tensor factorization. Our proposed approach samples a knowledge graph tensor in its graph representation and performs link prediction via ensembles of tensor factorization. Our experiments on both publicly available datasets and real world enterprise/sales knowledge bases have shown that our approach is not only highly scalable, parallelizable and memory efficient, but also able to increase the prediction accuracy significantly across all datasets.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125676091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Modeling Document Networks with Tree-Averaged Copula Regularization 基于树平均Copula正则化的文档网络建模

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018666

Yuan He, Cheng Wang, Changjun Jiang

{"title":"Modeling Document Networks with Tree-Averaged Copula Regularization","authors":"Yuan He, Cheng Wang, Changjun Jiang","doi":"10.1145/3018661.3018666","DOIUrl":"https://doi.org/10.1145/3018661.3018666","url":null,"abstract":"Document network is a kind of intriguing dataset which provides both topical (texts) and topological (links) information. Most previous work assumes that documents closely linked with each other share common topics. However, the associations among documents are usually complex, which are not limited to the homophily (i.e., tendency to link to similar others). Actually, the heterophily (i.e., tendency to link to different others) is another pervasive phenomenon in social networks. In this paper, we introduce a new tool, called copula, to separately model the documents and links, so that different copula functions can be applied to capture different correlation patterns. In statistics, a copula is a powerful framework for explicitly modeling the dependence of random variables by separating the marginals and their correlations. Though widely used in Economics, copulas have not been paid enough attention to by researchers in machine learning field. Besides, to further capture the potential associations among the unconnected documents, we apply the tree-averaged copula instead of a single copula function. This improvement makes our model achieve better expressive power, and also more elegant in algebra. We derive efficient EM algorithms to estimate the model parameters, and evaluate the performance of our model on three different datasets. Experimental results show that our approach achieves significant improvements on both topic and link modeling compared with the current state of the art.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125763552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Social Media Anomaly Detection: Challenges and Solutions 社交媒体异常检测:挑战与解决方案

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3022757

Yan Liu, S. Chawla

引用次数: 18

ANNE: Improving Source Code Search using Entity Retrieval Approach 使用实体检索方法改进源代码搜索

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018691

Venkatesh Vinayakarao, A. Sarma, Rahul Purandare, Shuktika Jain, Saumya Jain

引用次数: 22

Constructing and Embedding Abstract Event Causality Networks from Text Snippets 从文本片段构建和嵌入抽象事件因果网络

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018707

Sendong Zhao, Quan Wang, Sean Massung, Bing Qin, Ting Liu, Bin Wang, ChengXiang Zhai

{"title":"Constructing and Embedding Abstract Event Causality Networks from Text Snippets","authors":"Sendong Zhao, Quan Wang, Sean Massung, Bing Qin, Ting Liu, Bin Wang, ChengXiang Zhai","doi":"10.1145/3018661.3018707","DOIUrl":"https://doi.org/10.1145/3018661.3018707","url":null,"abstract":"In this paper, we formally define the problem of representing and leveraging abstract event causality to power downstream applications. We propose a novel solution to this problem, which build an abstract causality network and embed the causality network into a continuous vector space. The abstract causality network is generalized from a specific one, with abstract event nodes represented by frequently co-occurring word pairs. To perform the embedding task, we design a dual cause-effect transition model. Therefore, the proposed method can obtain general, frequent, and simple causality patterns, meanwhile, simplify event matching. Given the causality network and the learned embeddings, our model can be applied to a wide range of applications such as event prediction, event clustering and stock market movement prediction. Experimental results demonstrate that 1) the abstract causality network is effective for discovering high-level causality rules behind specific causal events; 2) the embedding models perform better than state-of-the-art link prediction techniques in predicting events; and 3) the event causality embedding is an easy-to-use and sophisticated feature for downstream applications such as stock market movement prediction.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"402 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116509799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 76

WSDM 2017 Workshop on Mining Online Health Reports: MOHRS 2017 WSDM 2017挖掘在线健康报告研讨会:MOHRS 2017

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3022761

Nigel Collier, Nut Limsopatham, A. Culotta, Mike Conway, I. Cox, Vasileios Lampos

{"title":"WSDM 2017 Workshop on Mining Online Health Reports: MOHRS 2017","authors":"Nigel Collier, Nut Limsopatham, A. Culotta, Mike Conway, I. Cox, Vasileios Lampos","doi":"10.1145/3018661.3022761","DOIUrl":"https://doi.org/10.1145/3018661.3022761","url":null,"abstract":"The workshop on Mining Online Health Reports (MOHRS) draws upon the rapidly developing field of Computational Health, focusing on textual content that has been generated through the various facets of Web activity. Online user-generated information mining, especially from social media platforms and search engines, has been in the forefront of many research efforts, especially in the fields of Information Retrieval and Natural Language Processing. The incorporation of such data and techniques in a number of health-oriented applications has provided strong evidence about the potential benefits, which include better population coverage, timeliness and the operational ability in places with less established health infrastructure. The workshop aims to create a platform where relevant state-of-the-art research is presented, but at the same time discussions among researchers with cross-disciplinary backgrounds can take place. It will focus on the characterisation of data sources, the essential methods for mining this textual information, as well as potential real-world applications and the arising ethical issues. MOHRS '17 will feature 3 keynote talks and 4 accepted paper presentations, together with a panel discussion session.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128215594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-Column Convolutional Neural Networks with Causality-Attention for Why-Question Answering 基于因果关系的多列卷积神经网络

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018737

Jong-Hoon Oh, Kentaro Torisawa, Canasai Kruengkrai, R. Iida, Julien Kloetzer

{"title":"Multi-Column Convolutional Neural Networks with Causality-Attention for Why-Question Answering","authors":"Jong-Hoon Oh, Kentaro Torisawa, Canasai Kruengkrai, R. Iida, Julien Kloetzer","doi":"10.1145/3018661.3018737","DOIUrl":"https://doi.org/10.1145/3018661.3018737","url":null,"abstract":"Why-question answering (why-QA) is a task to retrieve answers (or answer passages) to why-questions (e.g., \"why are tsunamis generated?\") from a text archive. Several previously proposed methods for why-QA improved their performance by automatically recognizing causalities that are expressed with such explicit cues as \"because\" in answer passages and using the recognized causalities as a clue for finding proper answers. However, in answer passages, causalities might be implicitly expressed, (i.e., without any explicit cues): \"An earthquake suddenly displaced sea water and a tsunami was generated.\" The previous works did not deal with such implicitly expressed causalities and failed to find proper answers that included the causalities. We improve why-QA based on the following two ideas. First, implicitly expressed causalities in one text might be expressed in other texts with explicit cues. If we can automatically recognize such explicitly expressed causalities from a text archive and use them to complement the implicitly expressed causalities in an answer passage, we can improve why-QA. Second, the causes of similar events tend to be described with a similar set of words (e.g., \"seismic energy\" and \"tectonic plates\" for \"the Great East Japan Earthquake\" and \"the 1906 San Francisco Earthquake\"). As such, even if we cannot find in a text archive any explicitly expressed cause of an event (e.g., \"the Great East Japan Earthquake\") expressed in a question (e.g., \"Why did the Great East Japan earthquake happen?\"), we might be able to identify its implicitly expressed causes with a set of words (e.g., \"tectonic plates\") that appear in the explicitly expressed cause of a similar event (e.g., \"the 1906 San Francisco Earthquake\"). We implemented these two ideas in our multi-column convolutional neural networks with a novel attention mechanism, which we call causality attention. Through experiments on Japanese why-QA, we confirmed that our proposed method outperformed the state-of-the-art systems.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125726438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 39

Not Enough Data?: Joint Inferring Multiple Diffusion Networks via Network Generation Priors 数据不足?:基于网络生成先验的多扩散网络联合推理

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018675

Xinran He, Yan Liu

{"title":"Not Enough Data?: Joint Inferring Multiple Diffusion Networks via Network Generation Priors","authors":"Xinran He, Yan Liu","doi":"10.1145/3018661.3018675","DOIUrl":"https://doi.org/10.1145/3018661.3018675","url":null,"abstract":"Network Inference, i.e., discovering latent diffusion networks from observed cascades, has been studied extensively in recent years, leading to a series of excellent work. However, it has been observed that the accuracy of existing methods deteriorates significantly when the number of cascades are limited (compared with the large number of nodes), which is the norm in real world applications. Meanwhile, we are able to collect cascades on many different topics or over a long time period: the associated influence networks (either topic-specific or time-specific) are highly correlated while the number of cascade observations associated with each network is very limited. In this work, we propose a generative model, referred to as the MultiCascades model (MCM), to address the challenge of data scarcity by exploring the commonality between multiple related diffusion networks. MCM builds a hierarchical graphical model, where all the diffusion networks share the same network prior, e.g., the popular Stochastic Blockmodels or the latent space models. The parameters of the network priors can be effectively learned by gleaning evidence from a large number of inferred networks. In return, each individual network can be inferred more accurately thanks to the prior information. Furthermore, we develop efficient inference and learning algorithms so that MCM is scalable for practical applications. The results on both synthetic datasets and real-world datasets demonstrate that MCM infers both topic-specific and time-varying diffusion networks more accurately.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133680790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16