Pengyuan Wang, Wei Sun, Dawei Yin, Jian Yang, Yi Chang
{"title":"Robust Tree-based Causal Inference for Complex Ad Effectiveness Analysis","authors":"Pengyuan Wang, Wei Sun, Dawei Yin, Jian Yang, Yi Chang","doi":"10.1145/2684822.2685294","DOIUrl":"https://doi.org/10.1145/2684822.2685294","url":null,"abstract":"As the online advertising industry has evolved into an age of diverse ad formats and delivery channels, users are exposed to complex ad treatments involving various ad characteristics. The diversity and generality of ad treatments call for accurate and causal measurement of ad effectiveness, i.e., how the ad treatment causes the changes in outcomes without the confounding effect by user characteristics. Various causal inference approaches have been proposed to measure the causal effect of ad treatments. However, most existing causal inference methods focus on univariate and binary treatment and are not well suited for complex ad treatments. Moreover, to be practical in the data-rich online environment, the measurement needs to be highly general and efficient, which is not addressed in conventional causal inference approaches. In this paper we propose a novel causal inference framework for assessing the impact of general advertising treatments. Our new framework enables analysis on uni- or multi-dimensional ad treatments, where each dimension (ad treatment factor) could be discrete or continuous. We prove that our approach is able to provide an unbiased estimation of the ad effectiveness by controlling the confounding effect of user characteristics. The framework is computationally efficient by employing a tree structure that specifies the relationship between user characteristics and the corresponding ad treatment. This tree-based framework is robust to model misspecification and highly flexible with minimal manual tuning. To demonstrate the efficacy of our approach, we apply it to two advertising campaigns. In the first campaign we evaluate the impact of different ad frequencies, and in the second one we consider the synthetic ad effectiveness across TV and online platforms. Our framework successfully provides the causal impact of ads with different frequencies in both campaigns. Moreover, it shows that the ad frequency usually has a treatment effect cap, which is usually over-estimated by naive estimation.","PeriodicalId":179443,"journal":{"name":"Proceedings of the Eighth ACM International Conference on Web Search and Data Mining","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126817630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DL-WSDM'15: Workshop on Deep Learning for Web Search and Data Mining","authors":"Bin Gao, Jiang Bian","doi":"10.1145/2684822.2697028","DOIUrl":"https://doi.org/10.1145/2684822.2697028","url":null,"abstract":"In recent years, deep learning has been a very hot topic in the machine learning community. It has brought break-through results in image classification and speech recognition. Most recently, researchers have also got many promising results in natural language processing using deep learning techniques. As machine learning techniques are widely used in the Web search and data mining applications, many researchers and practitioners are studying the possibility of applying the recently-developed deep learning techniques into these applications. Some of them have made very promising progress, and thus it is a good time to hold a workshop to discuss and share the problems and progress in using deep learning techniques to improve Web search and data mining tasks.","PeriodicalId":179443,"journal":{"name":"Proceedings of the Eighth ACM International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129868658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Will This Paper Increase Your h-index?: Scientific Impact Prediction","authors":"Yuxiao Dong, Reid A. Johnson, N. Chawla","doi":"10.1145/2684822.2685314","DOIUrl":"https://doi.org/10.1145/2684822.2685314","url":null,"abstract":"Scientific impact plays a central role in the evaluation of the output of scholars, departments, and institutions. A widely used measure of scientific impact is citations, with a growing body of literature focused on predicting the number of citations obtained by any given publication. The effectiveness of such predictions, however, is fundamentally limited by the power-law distribution of citations, whereby publications with few citations are extremely common and publications with many citations are relatively rare. Given this limitation, in this work we instead address a related question asked by many academic researchers in the course of writing a paper, namely: \"Will this paper increase my h-index?\" Using a real academic dataset with over 1.7 million authors, 2 million papers, and 8 million citation relationships from the premier online academic service ArnetMiner, we formalize a novel scientific impact prediction problem to examine several factors that can drive a paper to increase the primary author's h-index. We find that the researcher's authority on the publication topic and the venue in which the paper is published are crucial factors to the increase of the primary author's h-index, while the topic popularity and the co-authors' h-indices are of surprisingly little relevance. By leveraging relevant factors, we find a greater than 87.5% potential predictability for whether a paper will contribute to an author's h-index within five years. As a further experiment, we generate a self-prediction for this paper, estimating that there is a 76% probability that it will contribute to the h-index of the co-author with the highest current h-index in five years. We conclude that our findings on the quantification of scientific impact can help researchers to expand their influence and more effectively leverage their position of \"standing on the shoulders of giants.\"","PeriodicalId":179443,"journal":{"name":"Proceedings of the Eighth ACM International Conference on Web Search and Data Mining","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122933807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Negative Link Prediction in Social Media","authors":"Jiliang Tang, Shiyu Chang, C. Aggarwal, Huan Liu","doi":"10.1145/2684822.2685295","DOIUrl":"https://doi.org/10.1145/2684822.2685295","url":null,"abstract":"Signed network analysis has attracted increasing attention in recent years. This is in part because research on signed network analysis suggests that negative links have added value in the analytical process. A major impediment in their effective use is that most social media sites do not enable users to specify them explicitly. In other words, a gap exists between the importance of negative links and their availability in real data sets. Therefore, it is natural to explore whether one can predict negative links automatically from the commonly available social network data. In this paper, we investigate the novel problem of negative link prediction with only positive links and content-centric interactions in social media. We make a number of important observations about negative links, and propose a principled framework NeLP, which can exploit positive links and content-centric interactions to predict negative links. Our experimental results on real-world social networks demonstrate that the proposed NeLP framework can accurately predict negative links with positive links and content-centric interactions. Our detailed experiments also illustrate the relative importance of various factors to the effectiveness of the proposed framework.","PeriodicalId":179443,"journal":{"name":"Proceedings of the Eighth ACM International Conference on Web Search and Data Mining","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122019047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Flatow, Mor Naaman, K. Xie, Yana Volkovich, Y. Kanza
{"title":"On the Accuracy of Hyper-local Geotagging of Social Media Content","authors":"David Flatow, Mor Naaman, K. Xie, Yana Volkovich, Y. Kanza","doi":"10.1145/2684822.2685296","DOIUrl":"https://doi.org/10.1145/2684822.2685296","url":null,"abstract":"Social media users share billions of items per year, only a small fraction of which is geotagged. We present a data-driven approach for identifying non-geotagged content items that can be associated with a hyper-local geographic area by modeling the location distributions of n-grams that appear in the text. We explore the trade-off between accuracy and coverage of this method. Further, we explore differences across content received from multiple platforms and devices, and show, for example, that content shared via different sources and applications produces significantly different geographic distributions, and that it is preferred to model and predict location for items according to their source. Our findings show the potential and the bounds of a data-driven approach to assigning location data to short social media texts, and offer implications for all applications that use data-driven approaches to locate content.","PeriodicalId":179443,"journal":{"name":"Proceedings of the Eighth ACM International Conference on Web Search and Data Mining","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128013462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling Website Popularity Competition in the Attention-Activity Marketplace","authors":"Bruno Ribeiro, C. Faloutsos","doi":"10.1145/2684822.2685312","DOIUrl":"https://doi.org/10.1145/2684822.2685312","url":null,"abstract":"How does a new startup drive the popularity of competing websites into oblivion like Facebook famously did to MySpace? This question is of great interest to academics, technologists, and financial investors alike. In this work we exploit the singular way in which Facebook wiped out the popularity of MySpace, Hi5, Friendster, and Multiply to guide the design of a new popularity competition model. Our model provides new insights into what Nobel Laure- ate Herbert A. Simon called the \"marketplace of attention,\" which we recast as the attention-activity marketplace. Our model design is further substantiated by user-level activity of 250,000 MySpace users obtained between 2004 and 2009. The resulting model not only accurately fits the observed Daily Active Users (DAU) of Facebook and its competitors but also predicts their fate four years into the future.","PeriodicalId":179443,"journal":{"name":"Proceedings of the Eighth ACM International Conference on Web Search and Data Mining","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122334663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. de Rijke, Milad Shokouhi, A. Tomkins, Min Zhang
{"title":"Proceedings of the Eighth ACM International Conference on Web Search and Data Mining","authors":"M. de Rijke, Milad Shokouhi, A. Tomkins, Min Zhang","doi":"10.1145/2684822","DOIUrl":"https://doi.org/10.1145/2684822","url":null,"abstract":"It's our pleasure to welcome you to WSDM, the 10th annual ACM International Conference on Web Search and Data Mining, held in Cambridge, UK. WSDM is one of the premier conferences on web inspired research involving search and data mining. We are pleased to present here the proceedings of the conference. The program reflects the breadth and diversity of research in the field and showcases the latest developments in the field.The conference received a total of 505 submissions, which is 30% higher than any previous WSDM conference. The submitted papers cover the research of 1796 authors across 50 countries. Of these, 80 were accepted for publication. \u0000 \u0000WSDM has historically used a two-tier single-blind review process. This year, we applied this same procedure, but we also experimented with double-blind reviews in some cases. We will present analysis of the data from this experiment before the conference itself. \u0000 \u0000In the first stage of reviewing, four Program Committee members were assigned to each paper. In 90% of cases, all four reviews were completed; in 10% of cases, only three reviews were completed. The PC members provided ratings and comments while evaluating the papers according to the standard criteria of relevance, quality, reproducibility, clarity, and impact. This resulted in the collection of 1963 reviews. In the second stage, every paper was assigned to a Senior PC member. The SPC member was tasked to oversee a discussion amongst the reviewers and attempt to reach a consensus recommendation for the paper. The final decisions were based on all of the above. Ultimately 80 papers were selected for inclusion in the program. We owe a debt of gratitude to the 42 Senior PC members and the 242 PC members who participated in this process. \u0000 \u0000The WSDM 2017 acceptance rate of around 16% is 1-2% lower than previous years, but the number of submitted papers is 30% higher. This year, continuing with WSDM tradition, single-track oral presentation slots were allocated to all 80 accepted papers. Out of the 80 papers, 24 were assigned long presentation slots and 56 were assigned short presentation slots. This assignment was based on the topic and results in each paper, with the Program Chairs assigning long slots to papers more likely to appeal to a broader audience. In addition to the oral presentations, all papers will be presented as posters in interactive sessions. \u0000 \u0000The technical program this year features keynotes by prominent researchers from academia and industry: Ricardo Baeza-Yates (NTENT, USA; UPF, Spain; and U. de Chile), Claire Cardie (Cornell University), and Steve Young (Cambridge University). The program also features four invited Practice and Experience talks, which were introduced for the first time at WSDM 2014: Andrew Blake (Alan Turing Institute), Ralf Herbrich (Amazon), Anjali Joshi (Google) and Jan Pedersen (Microsoft). We would like to thank the keynote and P&E speakers for sharing their technical insights and research contribu","PeriodicalId":179443,"journal":{"name":"Proceedings of the Eighth ACM International Conference on Web Search and Data Mining","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123031615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}