Proceedings of the Tenth ACM International Conference on Web Search and Data Mining最新文献

PRED

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018680

Q. Yuan, Wei Zhang, Chao Zhang, Xinhe Geng, Gao Cong, Jiawei Han

{"title":"PRED","authors":"Q. Yuan, Wei Zhang, Chao Zhang, Xinhe Geng, Gao Cong, Jiawei Han","doi":"10.1145/3018661.3018680","DOIUrl":"https://doi.org/10.1145/3018661.3018680","url":null,"abstract":"The availability of massive geo-annotated social media data sheds light on studying human mobility patterns. Among them, periodic pattern, ie an individual visiting a geographical region with some specific time interval, has been recognized as one of the most important. Mining periodic patterns has a variety of applications, such as location prediction, anomaly detection, and location- and time-aware recommendation. However, it is a challenging task: the regions of a person and the periods of each region are both unknown. The interdependency between them makes the task even harder. Hence, existing methods are far from satisfactory for detecting periodic patterns from the low-sampling and noisy social media data. We propose a Bayesian non-parametric model, named textbf{P}eriodic textbf{RE}gion textbf{D}etection (PRED), to discover periodic mobility patterns by jointly modeling the geographical and temporal information. Our method differs from previous studies in that it is non-parametric and thus does not require priori knowledge about an individual's mobility (eg number of regions, period length, region size). Meanwhile, it models the time gap between two consecutive records rather than the exact visit time, making it less sensitive to data noise. Extensive experimental results on both synthetic and real-world datasets show that PRED outperforms the state-of-the-art methods significantly in four tasks: periodic region discovery, outlier movement finding, period detection, and location prediction.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117110768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Related Event Discovery 相关事件发现

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018713

Cheng Li, Michael Bendersky, Vijay Garg, Sujith Ravi

引用次数: 8

Synthesis of Forgiving Data Extractors 可原谅数据提取器的合成

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018740

Adi Omari, Sharon Shoham, Eran Yahav

{"title":"Synthesis of Forgiving Data Extractors","authors":"Adi Omari, Sharon Shoham, Eran Yahav","doi":"10.1145/3018661.3018740","DOIUrl":"https://doi.org/10.1145/3018661.3018740","url":null,"abstract":"We address the problem of synthesizing a robust data-extractor from a family of websites that contain the same kind of information. This problem is common when trying to aggregate information from many web sites, for example, when extracting information for a price-comparison site. Given a set of example annotated web pages from multiple sites in a family, our goal is to synthesize a robust data extractor that performs well on all sites in the family (not only on the provided example pages). The main challenge is the need to trade off precision for generality and robustness. Our key contribution is the introduction of forgiving extractors that dynamically adjust their precision to handle structural changes, without sacrificing precision on the training set. Our approach uses decision tree learning to create a generalized extractor and converts it into a forgiving extractor, inthe form of an XPath query. The forgiving extractor captures a series of pruned decision trees with monotonically decreasing precision, and monotonically increasing recall, and dynamically adjusts precision to guarantee sufficient recall. We have implemented our approach in a tool called TREEX and applied it to synthesize extractors for real-world large scale web sites. We evaluate the robustness and generality of the forgiving extractors by evaluating their precision and recall on: (i) different pages from sites in the training set (ii) pages from different versions of sites in the training set (iii) pages from different (unseen) sites. We compare the results of our synthesized extractor to those of classifier-based extractors, and pattern-based extractors, and show that TREEX significantly improves extraction accuracy.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"694 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122024924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Investigation of User Search Behavior While Facing Heterogeneous Search Services 面向异构搜索服务的用户搜索行为研究

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018673

Xin Li, Yiqun Liu, Rongjie Cai, Shaoping Ma

{"title":"Investigation of User Search Behavior While Facing Heterogeneous Search Services","authors":"Xin Li, Yiqun Liu, Rongjie Cai, Shaoping Ma","doi":"10.1145/3018661.3018673","DOIUrl":"https://doi.org/10.1145/3018661.3018673","url":null,"abstract":"With Web users' search tasks becoming increasingly complex, a single information source cannot necessarily satisfy their information needs. Searchers may rely on heterogeneous sources to complete their tasks, such as search engines, Community Question Answering (CQA), encyclopedia sites, and crowdsourcing platforms. Previous works focus on interaction behaviors with federated search results, including how to compose a federated Web search result page and what factors affect users' interaction behavior on aggregated search interfaces. However, little is known about which factors are crucial in determining users' search outcomes while facing multiple heterogeneous search services. In this paper, we design a lab-based user study to analyze what explicit and implicit factors affect search outcomes (information gain and user satisfaction) when users have access to heterogeneous information sources. In the study, each participant can access three different kinds of search services: a general search engine (Bing), a general CQA portal (Baidu Knows), and a high-quality CQA portal (Zhihu). Using questionnaires and interaction log data, we extract explicit and implicit signals to analyze how users' search outcomes are correlated with their behaviors on different information sources. Experimental results indicate that users' search experiences on CQA portals (such as users' perceived usefulness and number of result clicks) positively affect search outcome (information gain), while search satisfaction is significantly correlated with some other factors such as users' familiarity, interest and difficulty of the task. Besides, users' search satisfaction can be more accurately predicted by the implicit factors than search outcomes.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122072916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Online Matrix Completion for Signed Link Prediction 签名链路预测的在线矩阵补全

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018681

Jing Wang, Jie Shen, Ping Li, Huan Xu

{"title":"Online Matrix Completion for Signed Link Prediction","authors":"Jing Wang, Jie Shen, Ping Li, Huan Xu","doi":"10.1145/3018661.3018681","DOIUrl":"https://doi.org/10.1145/3018661.3018681","url":null,"abstract":"This work studies the binary matrix completion problem underlying a large body of real-world applications such as signed link prediction and information propagation. That is, each entry of the matrix indicates a binary preference such as \"like\" or \"dislike\", \"trust\" or \"distrust\". However, the performance of existing matrix completion methods may be hindered owing to three practical challenges: 1) the observed data are with binary label (i.e., not real value); 2) the data are typically sampled non-uniformly (i.e., positive links dominate the negative ones) and 3) a network may have a huge volume of data (i.e., memory and computational issue). In order to remedy these problems, we propose a novel framework which {i} maximizes the resemblance between predicted and observed matrices as well as penalizing the logistic loss to fit the binary data to produce binary estimates; {ii} constrains the matrix max-norm and maximizes the F-score to handle non-uniformness and {iii} presents online optimization technique, hence mitigating the memory cost. Extensive experiments performed on four large-scale datasets with up to hundreds of thousands of users demonstrate the superiority of our framework over the state-of-the-art matrix completion based methods and popular link prediction approaches.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126787015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Detecting and Characterizing Eating-Disorder Communities on Social Media 在社交媒体上检测和表征饮食失调社区

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018706

Tao Wang, M. Brede, Antonella Ianni, E. Mentzakis

{"title":"Detecting and Characterizing Eating-Disorder Communities on Social Media","authors":"Tao Wang, M. Brede, Antonella Ianni, E. Mentzakis","doi":"10.1145/3018661.3018706","DOIUrl":"https://doi.org/10.1145/3018661.3018706","url":null,"abstract":"Eating disorders are complex mental disorders and responsible for the highest mortality rate among mental illnesses. Recent studies reveal that user-generated content on social media provides useful information in understanding these disorders. Most previous studies focus on studying communities of people who discuss eating disorders on social media, while few studies have explored community structures and interactions among individuals who suffer from this disease over social media. In this paper, we first develop a snowball sampling method to automatically gather individuals who self-identify as eating disordered in their profile descriptions, as well as their social network connections with one another on Twitter. Then, we verify the effectiveness of our sampling method by: 1. quantifying differences between the sampled eating disordered users and two sets of reference data collected for non-disordered users in social status, behavioral patterns and psychometric properties; 2. building predictive models to classify eating disordered and non-disordered users. Finally, leveraging the data of social connections between eating disordered individuals on Twitter, we present the first homophily study among eating-disorder communities on social media. Our findings shed new light on how an eating-disorder community develops on social media.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129075232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 98

Directed Edge Recommender System 定向边缘推荐系统

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018729

Ios Kotsogiannis, E. Zheleva, Ashwin Machanavajjhala

引用次数: 10

Neural Survival Recommender 神经生存推荐

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018719

How Jing, Alex Smola

引用次数: 152

Modeling Air Travel Choice Behavior with Mixed Kernel Density Estimations 基于混合核密度估计的航空出行选择行为建模

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018671

Zhenni Feng, Yanmin Zhu, Jian Cao

{"title":"Modeling Air Travel Choice Behavior with Mixed Kernel Density Estimations","authors":"Zhenni Feng, Yanmin Zhu, Jian Cao","doi":"10.1145/3018661.3018671","DOIUrl":"https://doi.org/10.1145/3018661.3018671","url":null,"abstract":"Understanding air travel choice behavior of air passengers is of great significance for various purposes such as travel demand prediction and trip recommendation. Existing approaches based on surveys can only provide aggregate level air travel choice behavior of passengers and they fail to provide comprehensive information for personalized services. In this paper we focus on modeling individual level air travel choice behavior of passengers, which is valuable for recommendations and personalized services. We employ a probabilistic model to represent individual level air travel choice behavior based on a large dataset of historical booking records, leveraging several key factors, such as takeoff time, arrival time, elapsed time between reservation and takeoff, price, and seat class. However, each passenger has only a limited number of historical booking records, causing a serious data sparsity problem. To this end, we propose a mixed kernel density estimation (mix-KDE) approach for each passenger with a mixture model that combines probabilistic estimation of both regularity of the individual himself and social conformity of similar passengers. The proposed model is trained and evaluated via the expectation-maximization (EM) algorithm with a huge dataset of booking records of over 10 million air passengers from a popular online travel agency in China. Experimental results demonstrate that our mix-KDE approach outperforms the Gaussian mixture model (GMM) and the simple kernel density estimation in the presence of the sparsity issue.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129973455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Embedding of Embedding (EOE): Joint Embedding for Coupled Heterogeneous Networks 嵌入的嵌入:耦合异构网络的联合嵌入

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018723

Linchuan Xu, Xiaokai Wei, Jiannong Cao, Philip S. Yu

{"title":"Embedding of Embedding (EOE): Joint Embedding for Coupled Heterogeneous Networks","authors":"Linchuan Xu, Xiaokai Wei, Jiannong Cao, Philip S. Yu","doi":"10.1145/3018661.3018723","DOIUrl":"https://doi.org/10.1145/3018661.3018723","url":null,"abstract":"Network embedding is increasingly employed to assist network analysis as it is effective to learn latent features that encode linkage information. Various network embedding methods have been proposed, but they are only designed for a single network scenario. In the era of big data, different types of related information can be fused together to form a coupled heterogeneous network, which consists of two different but related sub-networks connected by inter-network edges. In this scenario, the inter-network edges can act as comple- mentary information in the presence of intra-network ones. This complementary information is important because it can make latent features more comprehensive and accurate. And it is more important when the intra-network edges are ab- sent, which can be referred to as the cold-start problem. In this paper, we thus propose a method named embedding of embedding (EOE) for coupled heterogeneous networks. In the EOE, latent features encode not only intra-network edges, but also inter-network ones. To tackle the challenge of heterogeneities of two networks, the EOE incorporates a harmonious embedding matrix to further embed the em- beddings that only encode intra-network edges. Empirical experiments on a variety of real-world datasets demonstrate the EOE outperforms consistently single network embedding methods in applications including visualization, link prediction multi-class classification, and multi-label classification.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129311664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 144