Liang Duan, C. Aggarwal, Shuai Ma, Renjun Hu, J. Huai
{"title":"Scaling up Link Prediction with Ensembles","authors":"Liang Duan, C. Aggarwal, Shuai Ma, Renjun Hu, J. Huai","doi":"10.1145/2835776.2835815","DOIUrl":"https://doi.org/10.1145/2835776.2835815","url":null,"abstract":"A network with $n$ nodes contains O(n2) possible links. Even for networks of modest size, it is often difficult to evaluate all pairwise possibilities for links in a meaningful way. Furthermore, even though link prediction is closely related to missing value estimation problems, such as collaborative filtering, it is often difficult to use sophisticated models such as latent factor methods because of their computational complexity over very large networks. Due to this computational complexity, most known link prediction methods are designed for evaluating the link propensity over a specified subset of links, rather than for performing a global search over the entire networks. In practice, however, it is essential to perform an exhaustive search over the entire networks. In this paper, we propose an ensemble enabled approach to scaling up link prediction, which is able to decompose traditional link prediction problems into subproblems of smaller size. These subproblems are each solved with the use of latent factor models, which can be effectively implemented over networks of modest size. Furthermore, the ensemble enabled approach has several advantages in terms of performance. We show the advantage of using ensemble-based latent factor models with experiments on very large networks. Experimental results demonstrate the effectiveness and scalability of our approach.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"56 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81437989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"WSDM 2016 Workshop on the Ethics of Online Experimentation","authors":"Fernando Diaz, Solon Barocas","doi":"10.1145/2835776.2855117","DOIUrl":"https://doi.org/10.1145/2835776.2855117","url":null,"abstract":"Online experimentation is now a core and near-constant part of the operation of a production online service, such as a web search engine or social media service. These are large-scale experiments that involve research subjects often numbering in the hundreds of thousands and wide-ranging, computer-automated variations in experimental treatment. In some cases, the results of online experiments may be of use internally to optimize system performance (for example, a test may be conducted to help make web page layout decisions). In other cases, the results may be of academic interest (for example, an experiment may be conducted to test a hypothesis about human behavior). Because of their rapid deployment and broad impact, online experimentation systems provide an extremely valuable tool for scientists and engineers. Despite this statistical power, in some situations, an online experiment can raise difficult ethical questions. One only needs to revisit the conversations resulting from the Facebook emotional contagion experiment to understand that some experiments may, at the very least, warrant careful review before being conducted. Since this episode, scholarship published mainly in the qualitative research and information law communities indicates that this may not be an isolated incident. Ethical and legal problems probably arise in other online experiments, published or not. As experimentation platforms and users become easily accessible, scientists and practitioners may increasingly put the well-being and trust of end users at risk. In light of these concerns, organizations often review online experiments before they are actually conducted. In production settings, the review process might vary with respect to formality or standards across companies and even groups within companies. When intended or used for academic publication, experiments or data may have undergone inconsistent review processes, some implementing academic-style institutional review boards and others none at all. Although there is a suggestion that service providers are concerned about the wellbeing of end users, the community does not","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83421331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Practice & Experience Track","authors":"Brian D. Davison","doi":"10.1145/3253881","DOIUrl":"https://doi.org/10.1145/3253881","url":null,"abstract":"","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"113 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80618494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mahmoudreza Babaei, Przemyslaw A. Grabowicz, I. Valera, K. Gummadi, M. Gomez-Rodriguez
{"title":"On the Efficiency of the Information Networks in Social Media","authors":"Mahmoudreza Babaei, Przemyslaw A. Grabowicz, I. Valera, K. Gummadi, M. Gomez-Rodriguez","doi":"10.1145/2835776.2835826","DOIUrl":"https://doi.org/10.1145/2835776.2835826","url":null,"abstract":"Social media sites are information marketplaces, where users produce and consume a wide variety of information and ideas. In these sites, users typically choose their information sources, which in turn determine what specific information they receive, how much information they receive and how quickly this information is shown to them. In this context, a natural question that arises is how efficient are social media users at selecting their information sources. In this work, we propose a computational framework to quantify users' efficiency at selecting information sources. Our framework is based on the assumption that the goal of users is to acquire a set of unique pieces of information. To quantify user's efficiency, we ask if the user could have acquired the same pieces of information from another set of sources more efficiently. We define three different notions of efficiency -- link, in-flow, and delay -- corresponding to the number of sources the user follows, the amount of (redundant) information she acquires and the delay with which she receives the information. Our definitions of efficiency are general and applicable to any social media system with an underlying in- formation network, in which every user follows others to receive the information they produce. In our experiments, we measure the efficiency of Twitter users at acquiring different types of information. We find that Twitter users exhibit sub-optimal efficiency across the three notions of efficiency, although they tend to be more efficient at acquiring non- popular pieces of information than they are at acquiring popular pieces of information. We then show that this lack of efficiency is a consequence of the triadic closure mechanism by which users typically discover and follow other users in social media. Thus, our study reveals a tradeoff between the efficiency and discoverability of information sources. Finally, we develop a heuristic algorithm that enables users to be significantly more efficient at acquiring the same unique pieces of information.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"183 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83032681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Ostroumova, P. Prokhorenkov, E. Samosvat, P. Serdyukov
{"title":"Publication Date Prediction through Reverse Engineering of the Web","authors":"L. Ostroumova, P. Prokhorenkov, E. Samosvat, P. Serdyukov","doi":"10.1145/2835776.2835796","DOIUrl":"https://doi.org/10.1145/2835776.2835796","url":null,"abstract":"In this paper, we focus on one of the most challenging tasks in temporal information retrieval: detection of a web page publication date. The natural approach to this problem is to find the publication date in the HTML body of a page. However, there are two fundamental problems with this approach. First, not all web pages contain the publication dates in their texts. Second, it is hard to distinguish the publication date among all the dates found in the page's text. The approach we suggest in this paper supplements methods of date extraction from the page's text with novel link-based methods of dating. Some of our link-based methods are based on a probabilistic model of the Web graph structure evolution, which relies on the publication dates of web pages as on its parameters. We use this model to estimate the publication dates of web pages: based only on the link structure currently observed, we perform a ``reverse engineering'' to reveal the whole process of the Web's evolution.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91028562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weinan Zhang, Yifei Rong, Jun Wang, Tianchi Zhu, Xiaofan Wang
{"title":"Feedback Control of Real-Time Display Advertising","authors":"Weinan Zhang, Yifei Rong, Jun Wang, Tianchi Zhu, Xiaofan Wang","doi":"10.1145/2835776.2835843","DOIUrl":"https://doi.org/10.1145/2835776.2835843","url":null,"abstract":"Real-Time Bidding (RTB) is revolutionising display advertising by facilitating per-impression auctions to buy ad impressions as they are being generated. Being able to use impression-level data, such as user cookies, encourages user behaviour targeting, and hence has significantly improved the effectiveness of ad campaigns. However, a fundamental drawback of RTB is its instability because the bid decision is made per impression and there are enormous fluctuations in campaigns' key performance indicators (KPIs). As such, advertisers face great difficulty in controlling their campaign performance against the associated costs. In this paper, we propose a feedback control mechanism for RTB which helps advertisers dynamically adjust the bids to effectively control the KPIs, e.g., the auction winning ratio and the effective cost per click. We further formulate an optimisation framework to show that the proposed feedback control mechanism also has the ability of optimising campaign performance. By settling the effective cost per click at an optimal reference value, the number of campaign's ad clicks can be maximised with the budget constraint. Our empirical study based on real-world data verifies the effectiveness and robustness of our RTB control system in various situations. The proposed feedback control mechanism has also been deployed on a commercial RTB platform and the online test has shown its success in generating controllable advertising performance.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"86 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90303096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Representation Learning for Information Diffusion through Social Networks: an Embedded Cascade Model","authors":"Simon Bourigault, S. Lamprier, P. Gallinari","doi":"10.1145/2835776.2835817","DOIUrl":"https://doi.org/10.1145/2835776.2835817","url":null,"abstract":"In this paper, we focus on information diffusion through social networks. Based on the well-known Independent Cascade model, we embed users of the social network in a latent space to extract more robust diffusion probabilities than those defined by classical graphical learning approaches. Better generalization abilities provided by the use of such a projection space allows our approach to present good performances on various real-world datasets, for both diffusion prediction and influence relationships inference tasks. Additionally, the use of a projection space enables our model to deal with larger social networks.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86742203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Bangcharoensap, T. Murata, Hayato Kobayashi, N. Shimizu
{"title":"Transductive Classification on Heterogeneous Information Networks with Edge Betweenness-based Normalization","authors":"P. Bangcharoensap, T. Murata, Hayato Kobayashi, N. Shimizu","doi":"10.1145/2835776.2835799","DOIUrl":"https://doi.org/10.1145/2835776.2835799","url":null,"abstract":"This paper proposes a novel method for transductive classification on heterogeneous information networks composed of multiple types of vertices. Such networks naturally represent many real-world Web data such as DBLP data (author, paper, and conference). Given a network where some vertices are labeled, the classifier aims to predict labels for the remaining vertices by propagating the labels to the entire network. In the label propagation process, many studies reduce the importance of edges connecting to a high-degree vertex. The assumption is unsatisfactory when reliability of a label of a vertex cannot be implied from its degree. On the basis of our intuition that edges bridging across communities are less trustworthy, we adapt edge betweenness to imply the importance of edges. Since directly applying the conventional edge betweenness is inefficient on heterogeneous networks, we propose two additional refinements. First, the centrality utilizes the fact that networks contain multiple types of vertices. Second, the centrality ignores flows originating from endpoints of considering edges. The experimental results on real-world datasets show our proposed method is more effective than a state-of-the-art method, GNetMine. On average, our method yields 92.79 ± 1.25% accuracy on a DBLP network even if only 1.92% of vertices are labeled. Our simple weighting scheme results in more than 5 percentage points increase in accuracy compared with GNetMine.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82550261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimizing Search Interactions within Professional Social Networks","authors":"N. Spirin","doi":"10.1145/2835776.2855092","DOIUrl":"https://doi.org/10.1145/2835776.2855092","url":null,"abstract":"To help users cope with the scale and influx of new information, professional social networks (PSNs) provide a search functionality. However, most of the search engines within PSNs today only support keyword queries and basic faceted search capabilities overlooking serendipitous network exploration and search for relationships between entities. This results in siloed information and a limited search space. My thesis is that we must redesign all major elements of a search user interface, such as input, control, and informational, to enable more effective search interactions within PSNs. I will introduce new insights and algorithms supporting the thesis.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85302159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}