Proceedings of the Ninth ACM International Conference on Web Search and Data Mining最新文献_第9页

Scaling up Link Prediction with Ensembles 利用集成扩展链路预测

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining Pub Date : 2016-02-08 DOI: 10.1145/2835776.2835815

Liang Duan, C. Aggarwal, Shuai Ma, Renjun Hu, J. Huai

{"title":"Scaling up Link Prediction with Ensembles","authors":"Liang Duan, C. Aggarwal, Shuai Ma, Renjun Hu, J. Huai","doi":"10.1145/2835776.2835815","DOIUrl":"https://doi.org/10.1145/2835776.2835815","url":null,"abstract":"A network with $n$ nodes contains O(n2) possible links. Even for networks of modest size, it is often difficult to evaluate all pairwise possibilities for links in a meaningful way. Furthermore, even though link prediction is closely related to missing value estimation problems, such as collaborative filtering, it is often difficult to use sophisticated models such as latent factor methods because of their computational complexity over very large networks. Due to this computational complexity, most known link prediction methods are designed for evaluating the link propensity over a specified subset of links, rather than for performing a global search over the entire networks. In practice, however, it is essential to perform an exhaustive search over the entire networks. In this paper, we propose an ensemble enabled approach to scaling up link prediction, which is able to decompose traditional link prediction problems into subproblems of smaller size. These subproblems are each solved with the use of latent factor models, which can be effectively implemented over networks of modest size. Furthermore, the ensemble enabled approach has several advantages in terms of performance. We show the advantage of using ensemble-based latent factor models with experiments on very large networks. Experimental results demonstrate the effectiveness and scalability of our approach.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"56 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81437989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Project Success Prediction in Crowdfunding Environments 众筹环境下的项目成功预测

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining Pub Date : 2016-02-08 DOI: 10.1145/2835776.2835791

Yan Li, Vineeth Rakesh, C. Reddy

{"title":"Project Success Prediction in Crowdfunding Environments","authors":"Yan Li, Vineeth Rakesh, C. Reddy","doi":"10.1145/2835776.2835791","DOIUrl":"https://doi.org/10.1145/2835776.2835791","url":null,"abstract":"Crowdfunding has gained widespread attention in recent years. Despite the huge success of crowdfunding platforms, the percentage of projects that succeed in achieving their desired goal amount is only around 40%. Moreover, many of these crowdfunding platforms follow \"all-or-nothing\" policy which means the pledged amount is collected only if the goal is reached within a certain predefined time duration. Hence, estimating the probability of success for a project is one of the most important research challenges in the crowdfunding domain. To predict the project success, there is a need for new prediction models that can potentially combine the power of both classification (which incorporate both successful and failed projects) and regression (for estimating the time for success). In this paper, we formulate the project success prediction as a survival analysis problem and apply the censored regression approach where one can perform regression in the presence of partial information. We rigorously study the project success time distribution of crowdfunding data and show that the logistic and log-logistic distributions are a natural choice for learning from such data. We investigate various censored regression models using comprehensive data of 18K Kickstarter (a popular crowdfunding platform) projects and 116K corresponding tweets collected from Twitter. We show that the models that take complete advantage of both the successful and failed projects during the training phase will perform significantly better at predicting the success of future projects compared to the ones that only use the successful projects. We provide a rigorous evaluation on many sets of relevant features and show that adding few temporal features that are obtained at the project's early stages can dramatically improve the performance.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76904034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 91

Representation Learning for Information Diffusion through Social Networks: an Embedded Cascade Model 社会网络中信息扩散的表征学习:一个嵌入式级联模型

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining Pub Date : 2016-02-08 DOI: 10.1145/2835776.2835817

Simon Bourigault, S. Lamprier, P. Gallinari

引用次数: 131

The Predictive Power of Massive Data about our Fine-Grained Behavior 关于我们细粒度行为的海量数据的预测能力

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining Pub Date : 2016-02-08 DOI: 10.1145/2835776.2835846

F. Provost

{"title":"The Predictive Power of Massive Data about our Fine-Grained Behavior","authors":"F. Provost","doi":"10.1145/2835776.2835846","DOIUrl":"https://doi.org/10.1145/2835776.2835846","url":null,"abstract":"What really is it about \"big data\" that makes it different from traditional data? In this talk I illustrate one important aspect: massive ultra-fine-grained data on individuals' behaviors holds remarkable predictive power. I examine several applications to marketing-related tasks, showing how machine learning methods can extract the predictive power and how the value of the data \"asset\" seems different from the value of traditional data used for predictive modeling. I then dig deeper into explaining the predictions made from massive numbers of fine-grained behaviors by applying a counter-factual framework for explaining model behavior based on treating the individual behaviors as evidence that is combined by the model. This analysis shows that the fine-grained behavior data incorporate various sorts of information that we traditionally have sought to capture by other means. For example, for marketing modeling the behavior data effectively incorporate demographics, psychographics, category interest, and purchase intent. Finally, I discuss the flip side of the coin: the remarkable predictive power based on fine-grained information on individuals raises new privacy concerns. In particular, I discuss privacy concerns based on inferences drawn about us (in contrast to privacy concerns stemming from violations to data confidentiality). The evidence counterfactual approach used to explain the predictions also can be used to provide online consumers with transparency into the reasons why inferences are drawn about them. In addition, it offers the possibility to design novel solutions such as a privacy-friendly \"cloaking device\" to inhibit inferences from being drawn based on particular behaviors.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79623044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Feedback Control of Real-Time Display Advertising 实时展示广告的反馈控制

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining Pub Date : 2016-02-08 DOI: 10.1145/2835776.2835843

Weinan Zhang, Yifei Rong, Jun Wang, Tianchi Zhu, Xiaofan Wang

{"title":"Feedback Control of Real-Time Display Advertising","authors":"Weinan Zhang, Yifei Rong, Jun Wang, Tianchi Zhu, Xiaofan Wang","doi":"10.1145/2835776.2835843","DOIUrl":"https://doi.org/10.1145/2835776.2835843","url":null,"abstract":"Real-Time Bidding (RTB) is revolutionising display advertising by facilitating per-impression auctions to buy ad impressions as they are being generated. Being able to use impression-level data, such as user cookies, encourages user behaviour targeting, and hence has significantly improved the effectiveness of ad campaigns. However, a fundamental drawback of RTB is its instability because the bid decision is made per impression and there are enormous fluctuations in campaigns' key performance indicators (KPIs). As such, advertisers face great difficulty in controlling their campaign performance against the associated costs. In this paper, we propose a feedback control mechanism for RTB which helps advertisers dynamically adjust the bids to effectively control the KPIs, e.g., the auction winning ratio and the effective cost per click. We further formulate an optimisation framework to show that the proposed feedback control mechanism also has the ability of optimising campaign performance. By settling the effective cost per click at an optimal reference value, the number of campaign's ad clicks can be maximised with the budget constraint. Our empirical study based on real-world data verifies the effectiveness and robustness of our RTB control system in various situations. The proposed feedback control mechanism has also been deployed on a commercial RTB platform and the online test has shown its success in generating controllable advertising performance.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"86 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90303096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 57

On the Efficiency of the Information Networks in Social Media 论社交媒体中信息网络的效率

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining Pub Date : 2016-02-08 DOI: 10.1145/2835776.2835826

Mahmoudreza Babaei, Przemyslaw A. Grabowicz, I. Valera, K. Gummadi, M. Gomez-Rodriguez

{"title":"On the Efficiency of the Information Networks in Social Media","authors":"Mahmoudreza Babaei, Przemyslaw A. Grabowicz, I. Valera, K. Gummadi, M. Gomez-Rodriguez","doi":"10.1145/2835776.2835826","DOIUrl":"https://doi.org/10.1145/2835776.2835826","url":null,"abstract":"Social media sites are information marketplaces, where users produce and consume a wide variety of information and ideas. In these sites, users typically choose their information sources, which in turn determine what specific information they receive, how much information they receive and how quickly this information is shown to them. In this context, a natural question that arises is how efficient are social media users at selecting their information sources. In this work, we propose a computational framework to quantify users' efficiency at selecting information sources. Our framework is based on the assumption that the goal of users is to acquire a set of unique pieces of information. To quantify user's efficiency, we ask if the user could have acquired the same pieces of information from another set of sources more efficiently. We define three different notions of efficiency -- link, in-flow, and delay -- corresponding to the number of sources the user follows, the amount of (redundant) information she acquires and the delay with which she receives the information. Our definitions of efficiency are general and applicable to any social media system with an underlying in- formation network, in which every user follows others to receive the information they produce. In our experiments, we measure the efficiency of Twitter users at acquiring different types of information. We find that Twitter users exhibit sub-optimal efficiency across the three notions of efficiency, although they tend to be more efficient at acquiring non- popular pieces of information than they are at acquiring popular pieces of information. We then show that this lack of efficiency is a consequence of the triadic closure mechanism by which users typically discover and follow other users in social media. Thus, our study reveals a tradeoff between the efficiency and discoverability of information sources. Finally, we develop a heuristic algorithm that enables users to be significantly more efficient at acquiring the same unique pieces of information.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"183 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83032681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Session details: Practice & Experience Track 会议详情:实践与经验专场

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining Pub Date : 2016-02-08 DOI: 10.1145/3253881

Brian D. Davison

引用次数: 0

Optimizing Search Interactions within Professional Social Networks 在专业社交网络中优化搜索交互

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining Pub Date : 2016-02-08 DOI: 10.1145/2835776.2855092

N. Spirin

引用次数: 0

Detecting Social Media Icebergs by Their Tips: Rumors, Persuasion Campaigns, and Information Needs 通过他们的提示检测社交媒体冰山:谣言，说服活动和信息需求

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining Pub Date : 2016-02-08 DOI: 10.1145/2835776.2855086

Zhe Zhao

{"title":"Detecting Social Media Icebergs by Their Tips: Rumors, Persuasion Campaigns, and Information Needs","authors":"Zhe Zhao","doi":"10.1145/2835776.2855086","DOIUrl":"https://doi.org/10.1145/2835776.2855086","url":null,"abstract":"Online activities of more than one billion social media users all over the world form a resourceful ocean of data. Many social media mining techniques try to explore this ocean and extract different types of resources. In this thesis, we present a framework that can detect different types of meaningful social media phenomena. They usually can be viewed as a group of online activities from many social media users with a common or similar objective, such as spreading of rumors, bursting information needs on events and products, or asking for support of an action. These different types of social media phenomena are relatively rare but can be very influential. Detecting them is challenging according to its characteristics. Each phenomenon contains a collection of activities that usually take variety of forms. Taking the spreading of rumor in social media as an example, one rumor may be spread in different forms of statements and expressions. And it can be very hard to distinguish them from statements from trustful sources. Existing work of detecting different types of social media phenomena usually adopts classifiers trained on features of a single activity or cluster of activities [1]. However, the features from single activity are not sufficient for many detection tasks. And the features from cluster of activities will not be significant until that cluster becomes large enough, which cannot be used in early stage detection . In this thesis, we propose to detect meaningful social media phenomena by signal user behaviors observed at an early stage. Just like spotting icebergs in the ocean by their tips, in our case, the tip of a social media iceberg is a small proportion of activities that exist only in social media icebergs. And they can be found even at the early stage. Therefore, we design our detection framework to first detect these specific signal activities. Then we will use them to understand the characteristic of the entire collection of activities from social media phenomena . What we learned can be used to train accurate classifiers to identify whether a collection of activities containing signal activities is a target social media phenomenon or not. This framework is generic and can be applied on detecting many different types of collective activities in social media. We apply our framework on detecting three types of meaningful soPermission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). WSDM 2016 February 22-25, 2016, San Francisco, CA, USA c © 2016 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-3716-8/16/02. DOI: http://dx.doi.org/10.1145/2835776.2855086 cial media phenomena, i.e., emerging ","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"83 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81427813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Transductive Classification on Heterogeneous Information Networks with Edge Betweenness-based Normalization 基于边缘间距归一化的异构信息网络转换分类

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining Pub Date : 2016-02-08 DOI: 10.1145/2835776.2835799

P. Bangcharoensap, T. Murata, Hayato Kobayashi, N. Shimizu

{"title":"Transductive Classification on Heterogeneous Information Networks with Edge Betweenness-based Normalization","authors":"P. Bangcharoensap, T. Murata, Hayato Kobayashi, N. Shimizu","doi":"10.1145/2835776.2835799","DOIUrl":"https://doi.org/10.1145/2835776.2835799","url":null,"abstract":"This paper proposes a novel method for transductive classification on heterogeneous information networks composed of multiple types of vertices. Such networks naturally represent many real-world Web data such as DBLP data (author, paper, and conference). Given a network where some vertices are labeled, the classifier aims to predict labels for the remaining vertices by propagating the labels to the entire network. In the label propagation process, many studies reduce the importance of edges connecting to a high-degree vertex. The assumption is unsatisfactory when reliability of a label of a vertex cannot be implied from its degree. On the basis of our intuition that edges bridging across communities are less trustworthy, we adapt edge betweenness to imply the importance of edges. Since directly applying the conventional edge betweenness is inefficient on heterogeneous networks, we propose two additional refinements. First, the centrality utilizes the fact that networks contain multiple types of vertices. Second, the centrality ignores flows originating from endpoints of considering edges. The experimental results on real-world datasets show our proposed method is more effective than a state-of-the-art method, GNetMine. On average, our method yields 92.79 ± 1.25% accuracy on a DBLP network even if only 1.92% of vertices are labeled. Our simple weighting scheme results in more than 5 percentage points increase in accuracy compared with GNetMine.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82550261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16