Proceedings of the Ninth ACM International Conference on Web Search and Data Mining最新文献_第10页

WSDM 2016 Workshop on the Ethics of Online Experimentation WSDM 2016在线实验伦理研讨会

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining Pub Date : 2016-02-08 DOI: 10.1145/2835776.2855117

Fernando Diaz, Solon Barocas

{"title":"WSDM 2016 Workshop on the Ethics of Online Experimentation","authors":"Fernando Diaz, Solon Barocas","doi":"10.1145/2835776.2855117","DOIUrl":"https://doi.org/10.1145/2835776.2855117","url":null,"abstract":"Online experimentation is now a core and near-constant part of the operation of a production online service, such as a web search engine or social media service. These are large-scale experiments that involve research subjects often numbering in the hundreds of thousands and wide-ranging, computer-automated variations in experimental treatment. In some cases, the results of online experiments may be of use internally to optimize system performance (for example, a test may be conducted to help make web page layout decisions). In other cases, the results may be of academic interest (for example, an experiment may be conducted to test a hypothesis about human behavior). Because of their rapid deployment and broad impact, online experimentation systems provide an extremely valuable tool for scientists and engineers. Despite this statistical power, in some situations, an online experiment can raise difficult ethical questions. One only needs to revisit the conversations resulting from the Facebook emotional contagion experiment to understand that some experiments may, at the very least, warrant careful review before being conducted. Since this episode, scholarship published mainly in the qualitative research and information law communities indicates that this may not be an isolated incident. Ethical and legal problems probably arise in other online experiments, published or not. As experimentation platforms and users become easily accessible, scientists and practitioners may increasingly put the well-being and trust of end users at risk. In light of these concerns, organizations often review online experiments before they are actually conducted. In production settings, the review process might vary with respect to formality or standards across companies and even groups within companies. When intended or used for academic publication, experiments or data may have undergone inconsistent review processes, some implementing academic-style institutional review boards and others none at all. Although there is a suggestion that service providers are concerned about the wellbeing of end users, the community does not","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83421331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Publication Date Prediction through Reverse Engineering of the Web 通过Web逆向工程预测出版日期

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining Pub Date : 2016-02-08 DOI: 10.1145/2835776.2835796

L. Ostroumova, P. Prokhorenkov, E. Samosvat, P. Serdyukov

引用次数: 5

Querying and Tracking Influencers in Social Streams 查询和跟踪社交流中的影响者

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining Pub Date : 2016-02-08 DOI: 10.1145/2835776.2835788

Karthik Subbian, C. Aggarwal, J. Srivastava

引用次数: 27

Temporal Formation and Evolution of Online Communities 网络社区的时间形成与演化

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining Pub Date : 2016-02-08 DOI: 10.1145/2835776.2855089

Hossein Fani

{"title":"Temporal Formation and Evolution of Online Communities","authors":"Hossein Fani","doi":"10.1145/2835776.2855089","DOIUrl":"https://doi.org/10.1145/2835776.2855089","url":null,"abstract":"Researchers have already studied the identification of online communities and the possible impact or influence relationships from several perspectives. For instance, communities of users that are formed based on shared relationships and topological similarities, or communities that consist of users that share similar content. However, little work has been done on detection of communities that simultaneously share topical and temporal similarities. Furthermore, these studies have not explored the causation relationship between the communities. Causation provides systematic explanation as to why communities are formed and helps to predict future communities. This proposal will address two main research questions: i) how can communities that share topical and temporal similarities be identified, and ii) how can causation relation between different online communities be detected and modelled. We model users' behaviour towards topics of interest through multivariate time series to identify like-minded communities. Further, we employ Granger's concept of causality to infer causation between detected communities from corresponding users' time series. Granger causality is the prominent approach in time series modelling and rests on a firm statistical foundation. We assess the proposed community detection methods through comparison with the state of the art and verify the causal model through its prediction accuracy.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"107 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81501166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Kangaroo: Workload-Aware Processing of Range Data and Range Queries in Hadoop 袋鼠:Hadoop中范围数据和范围查询的工作负载感知处理

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining Pub Date : 2016-02-08 DOI: 10.1145/2835776.2835841

Ahmed M. Aly, Hazem Elmeleegy, Yan Qi, Walid G. Aref

{"title":"Kangaroo: Workload-Aware Processing of Range Data and Range Queries in Hadoop","authors":"Ahmed M. Aly, Hazem Elmeleegy, Yan Qi, Walid G. Aref","doi":"10.1145/2835776.2835841","DOIUrl":"https://doi.org/10.1145/2835776.2835841","url":null,"abstract":"Despite the importance and widespread use of range data, e.g., time intervals, spatial ranges, etc., little attention has been devoted to study the processing and querying of range data in the context of big data. The main challenge relies in the nature of the traditional index structures e.g., B-Tree and R-Tree, being centralized by nature, and hence are almost crippled when deployed in a distributed environment. To address this challenge, this paper presents Kangaroo, a system built on top of Hadoop to optimize the execution of range queries over range data. The main idea behind Kangaroo is to split the data into non-overlapping partitions in a way that minimizes the query execution time. Kangaroo is query workload-aware, i.e., results in partitioning layouts that minimize the query processing time of given query patterns. In this paper, we study the design challenges Kangaroo addresses in order to be deployed on top of a distributed file system, i.e., HDFS. We also study four different partitioning schemes that Kangaroo can support. With extensive experiments using real range data of more than one billion records and real query workload of more than 30,000 queries, we show that the partitioning schemes of Kangaroo can significantly reduce the I/O of range queries on range data.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86230403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Understanding Offline Political Systems by Mining Online Political Data 通过挖掘在线政治数据来理解离线政治系统

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining Pub Date : 2016-02-08 DOI: 10.1145/2835776.2855112

D. Lazer, Oren Tsur, Tina Eliassi-Rad

{"title":"Understanding Offline Political Systems by Mining Online Political Data","authors":"D. Lazer, Oren Tsur, Tina Eliassi-Rad","doi":"10.1145/2835776.2855112","DOIUrl":"https://doi.org/10.1145/2835776.2855112","url":null,"abstract":"\"Man is by nature a political animal\", as asserted by Aristotle. This political nature manifests itself in the data we produce and the traces we leave online. In this tutorial, we address a number of fundamental issues regarding mining of political data: What types of data could be considered political? What can we learn from such data? Can we use the data for prediction of political changes, etc? How can these prediction tasks be done efficiently? Can we use online socio-political data in order to get a better understanding of our political systems and of recent political changes? What are the pitfalls and inherent shortcomings of using online data for political analysis? In recent years, with the abundance of data, these questions, among others, have gained importance, especially in light of the global political turmoil and the upcoming 2016 US presidential election. We introduce relevant political science theory, describe the challenges within the framework of computational social science and present state of the art approaches bridging social network analysis, graph mining, and natural language processing.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79076232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Collaborative Denoising Auto-Encoders for Top-N Recommender Systems Top-N推荐系统的协同去噪自编码器

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining Pub Date : 2016-02-08 DOI: 10.1145/2835776.2835837

Yao Wu, Christopher DuBois, A. Zheng, M. Ester

引用次数: 855

The Past and Future of Systems for Current Events 当前事件系统的过去和未来

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining Pub Date : 2016-02-08 DOI: 10.1145/2835776.2835850

Mor Naaman

引用次数: 0

Keynote Speaker Bio 主讲人简介

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining Pub Date : 2016-02-08 DOI: 10.1145/2835776.2835845

Yiling Chen

引用次数: 1

Crowdsourcing High Quality Labels with a Tight Budget 在预算紧张的情况下众包高质量的标签

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining Pub Date : 2016-02-08 DOI: 10.1145/2835776.2835797

Qi Li, Fenglong Ma, Jing Gao, Lu Su, Christopher J. Quinn

{"title":"Crowdsourcing High Quality Labels with a Tight Budget","authors":"Qi Li, Fenglong Ma, Jing Gao, Lu Su, Christopher J. Quinn","doi":"10.1145/2835776.2835797","DOIUrl":"https://doi.org/10.1145/2835776.2835797","url":null,"abstract":"In the past decade, commercial crowdsourcing platforms have revolutionized the ways of classifying and annotating data, especially for large datasets. Obtaining labels for a single instance can be inexpensive, but for large datasets, it is important to allocate budgets wisely. With limited budgets, requesters must trade-off between the quantity of labeled instances and the quality of the final results. Existing budget allocation methods can achieve good quantity but cannot guarantee high quality of individual instances under a tight budget. However, in some scenarios, requesters may be willing to label fewer instances but of higher quality. Moreover, they may have different requirements on quality for different tasks. To address these challenges, we propose a flexible budget allocation framework called Requallo. Requallo allows requesters to set their specific requirements on the labeling quality and maximizes the number of labeled instances that achieve the quality requirement under a tight budget. The budget allocation problem is modeled as a Markov decision process and a sequential labeling policy is produced. The proposed policy greedily searches for the instance to query next as the one that can provide the maximum reward for the goal. The Requallo framework is further extended to consider worker reliability so that the budget can be better allocated. Experiments on two real-world crowdsourcing tasks as well as a simulated task demonstrate that when the budget is tight, the proposed Requallo framework outperforms existing state-of-the-art budget allocation methods from both quantity and quality aspects.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84581041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 48