Proceedings of the Ninth ACM International Conference on Web Search and Data Mining最新文献

筛选
英文 中文
WSDM 2016 Workshop on the Ethics of Online Experimentation WSDM 2016在线实验伦理研讨会
Fernando Diaz, Solon Barocas
{"title":"WSDM 2016 Workshop on the Ethics of Online Experimentation","authors":"Fernando Diaz, Solon Barocas","doi":"10.1145/2835776.2855117","DOIUrl":"https://doi.org/10.1145/2835776.2855117","url":null,"abstract":"Online experimentation is now a core and near-constant part of the operation of a production online service, such as a web search engine or social media service. These are large-scale experiments that involve research subjects often numbering in the hundreds of thousands and wide-ranging, computer-automated variations in experimental treatment. In some cases, the results of online experiments may be of use internally to optimize system performance (for example, a test may be conducted to help make web page layout decisions). In other cases, the results may be of academic interest (for example, an experiment may be conducted to test a hypothesis about human behavior). Because of their rapid deployment and broad impact, online experimentation systems provide an extremely valuable tool for scientists and engineers. Despite this statistical power, in some situations, an online experiment can raise difficult ethical questions. One only needs to revisit the conversations resulting from the Facebook emotional contagion experiment to understand that some experiments may, at the very least, warrant careful review before being conducted. Since this episode, scholarship published mainly in the qualitative research and information law communities indicates that this may not be an isolated incident. Ethical and legal problems probably arise in other online experiments, published or not. As experimentation platforms and users become easily accessible, scientists and practitioners may increasingly put the well-being and trust of end users at risk. In light of these concerns, organizations often review online experiments before they are actually conducted. In production settings, the review process might vary with respect to formality or standards across companies and even groups within companies. When intended or used for academic publication, experiments or data may have undergone inconsistent review processes, some implementing academic-style institutional review boards and others none at all. Although there is a suggestion that service providers are concerned about the wellbeing of end users, the community does not","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83421331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Publication Date Prediction through Reverse Engineering of the Web 通过Web逆向工程预测出版日期
L. Ostroumova, P. Prokhorenkov, E. Samosvat, P. Serdyukov
{"title":"Publication Date Prediction through Reverse Engineering of the Web","authors":"L. Ostroumova, P. Prokhorenkov, E. Samosvat, P. Serdyukov","doi":"10.1145/2835776.2835796","DOIUrl":"https://doi.org/10.1145/2835776.2835796","url":null,"abstract":"In this paper, we focus on one of the most challenging tasks in temporal information retrieval: detection of a web page publication date. The natural approach to this problem is to find the publication date in the HTML body of a page. However, there are two fundamental problems with this approach. First, not all web pages contain the publication dates in their texts. Second, it is hard to distinguish the publication date among all the dates found in the page's text. The approach we suggest in this paper supplements methods of date extraction from the page's text with novel link-based methods of dating. Some of our link-based methods are based on a probabilistic model of the Web graph structure evolution, which relies on the publication dates of web pages as on its parameters. We use this model to estimate the publication dates of web pages: based only on the link structure currently observed, we perform a ``reverse engineering'' to reveal the whole process of the Web's evolution.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91028562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Querying and Tracking Influencers in Social Streams 查询和跟踪社交流中的影响者
Karthik Subbian, C. Aggarwal, J. Srivastava
{"title":"Querying and Tracking Influencers in Social Streams","authors":"Karthik Subbian, C. Aggarwal, J. Srivastava","doi":"10.1145/2835776.2835788","DOIUrl":"https://doi.org/10.1145/2835776.2835788","url":null,"abstract":"Influence analysis is an important problem in social network analysis due to its impact on viral marketing and targeted advertisements. Most of the existing influence analysis methods determine the influencers in a static network with an influence propagation model based on pre-defined edge propagation probabilities. However, none of these models can be queried to find influencers in both context and time-sensitive fashion from a streaming social data. In this paper, we propose an approach to maintain real-time influence scores of users in a social stream using a topic and time-sensitive approach, while the network and topic is constantly evolving over time. We show that our approach is efficient in terms of online maintenance and effective in terms various types of real-time context- and time-sensitive queries. We evaluate our results on both social and collaborative network data sets.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"1101 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76744016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Temporal Formation and Evolution of Online Communities 网络社区的时间形成与演化
Hossein Fani
{"title":"Temporal Formation and Evolution of Online Communities","authors":"Hossein Fani","doi":"10.1145/2835776.2855089","DOIUrl":"https://doi.org/10.1145/2835776.2855089","url":null,"abstract":"Researchers have already studied the identification of online communities and the possible impact or influence relationships from several perspectives. For instance, communities of users that are formed based on shared relationships and topological similarities, or communities that consist of users that share similar content. However, little work has been done on detection of communities that simultaneously share topical and temporal similarities. Furthermore, these studies have not explored the causation relationship between the communities. Causation provides systematic explanation as to why communities are formed and helps to predict future communities. This proposal will address two main research questions: i) how can communities that share topical and temporal similarities be identified, and ii) how can causation relation between different online communities be detected and modelled. We model users' behaviour towards topics of interest through multivariate time series to identify like-minded communities. Further, we employ Granger's concept of causality to infer causation between detected communities from corresponding users' time series. Granger causality is the prominent approach in time series modelling and rests on a firm statistical foundation. We assess the proposed community detection methods through comparison with the state of the art and verify the causal model through its prediction accuracy.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"107 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81501166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Kangaroo: Workload-Aware Processing of Range Data and Range Queries in Hadoop 袋鼠:Hadoop中范围数据和范围查询的工作负载感知处理
Ahmed M. Aly, Hazem Elmeleegy, Yan Qi, Walid G. Aref
{"title":"Kangaroo: Workload-Aware Processing of Range Data and Range Queries in Hadoop","authors":"Ahmed M. Aly, Hazem Elmeleegy, Yan Qi, Walid G. Aref","doi":"10.1145/2835776.2835841","DOIUrl":"https://doi.org/10.1145/2835776.2835841","url":null,"abstract":"Despite the importance and widespread use of range data, e.g., time intervals, spatial ranges, etc., little attention has been devoted to study the processing and querying of range data in the context of big data. The main challenge relies in the nature of the traditional index structures e.g., B-Tree and R-Tree, being centralized by nature, and hence are almost crippled when deployed in a distributed environment. To address this challenge, this paper presents Kangaroo, a system built on top of Hadoop to optimize the execution of range queries over range data. The main idea behind Kangaroo is to split the data into non-overlapping partitions in a way that minimizes the query execution time. Kangaroo is query workload-aware, i.e., results in partitioning layouts that minimize the query processing time of given query patterns. In this paper, we study the design challenges Kangaroo addresses in order to be deployed on top of a distributed file system, i.e., HDFS. We also study four different partitioning schemes that Kangaroo can support. With extensive experiments using real range data of more than one billion records and real query workload of more than 30,000 queries, we show that the partitioning schemes of Kangaroo can significantly reduce the I/O of range queries on range data.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86230403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Understanding Offline Political Systems by Mining Online Political Data 通过挖掘在线政治数据来理解离线政治系统
D. Lazer, Oren Tsur, Tina Eliassi-Rad
{"title":"Understanding Offline Political Systems by Mining Online Political Data","authors":"D. Lazer, Oren Tsur, Tina Eliassi-Rad","doi":"10.1145/2835776.2855112","DOIUrl":"https://doi.org/10.1145/2835776.2855112","url":null,"abstract":"\"Man is by nature a political animal\", as asserted by Aristotle. This political nature manifests itself in the data we produce and the traces we leave online. In this tutorial, we address a number of fundamental issues regarding mining of political data: What types of data could be considered political? What can we learn from such data? Can we use the data for prediction of political changes, etc? How can these prediction tasks be done efficiently? Can we use online socio-political data in order to get a better understanding of our political systems and of recent political changes? What are the pitfalls and inherent shortcomings of using online data for political analysis? In recent years, with the abundance of data, these questions, among others, have gained importance, especially in light of the global political turmoil and the upcoming 2016 US presidential election. We introduce relevant political science theory, describe the challenges within the framework of computational social science and present state of the art approaches bridging social network analysis, graph mining, and natural language processing.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79076232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Collaborative Denoising Auto-Encoders for Top-N Recommender Systems Top-N推荐系统的协同去噪自编码器
Yao Wu, Christopher DuBois, A. Zheng, M. Ester
{"title":"Collaborative Denoising Auto-Encoders for Top-N Recommender Systems","authors":"Yao Wu, Christopher DuBois, A. Zheng, M. Ester","doi":"10.1145/2835776.2835837","DOIUrl":"https://doi.org/10.1145/2835776.2835837","url":null,"abstract":"Most real-world recommender services measure their performance based on the top-N results shown to the end users. Thus, advances in top-N recommendation have far-ranging consequences in practical applications. In this paper, we present a novel method, called Collaborative Denoising Auto-Encoder (CDAE), for top-N recommendation that utilizes the idea of Denoising Auto-Encoders. We demonstrate that the proposed model is a generalization of several well-known collaborative filtering models but with more flexible components. Thorough experiments are conducted to understand the performance of CDAE under various component settings. Furthermore, experimental results on several public datasets demonstrate that CDAE consistently outperforms state-of-the-art top-N recommendation methods on a variety of common evaluation metrics.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73517419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 855
The Past and Future of Systems for Current Events 当前事件系统的过去和未来
Mor Naaman
{"title":"The Past and Future of Systems for Current Events","authors":"Mor Naaman","doi":"10.1145/2835776.2835850","DOIUrl":"https://doi.org/10.1145/2835776.2835850","url":null,"abstract":"People share in social media an overwhelming amount of content from real-world events. These events range from major global events like an uprising or an earthquake, to local events and emergencies such as a fire or a parade; from media events like the Oscar's, to events that enjoy little media coverage such as a conference or a music concert. This shared media represents an important part of our society, culture and history. At the same time, this social media content is still fragmented across services, hard to find, and difficult to consume and understand.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"118 4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74597107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Keynote Speaker Bio 主讲人简介
Yiling Chen
{"title":"Keynote Speaker Bio","authors":"Yiling Chen","doi":"10.1145/2835776.2835845","DOIUrl":"https://doi.org/10.1145/2835776.2835845","url":null,"abstract":"Chen has served as the Tutorial Chair of the ACM Conference on Electronic Commerce (EC), 2008, and on the Program Committee for the International World Wide Web Conference (WWW), 2008, and the International Workshop on Internet and Network Economics (WINE), 2008. She has co-organized the 2nd and the 3rd Workshops on Prediction Markets, 2007-2008. She has also been a reviewer for Management Science, Information Systems Research, Decision Support Systems, Information Systems and e-Business Management, and various conferences. Chen’s awards include Outstanding Paper Award, ACM Conference on Electronic Commerce (EC), 2008; Honorable Mention, Decision Science Institute Doctoral Dissertation Competition, 2006; and eBRC Doctoral Support Award, eBusiness Research Center at Penn State University, 2004.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74501704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Crowdsourcing High Quality Labels with a Tight Budget 在预算紧张的情况下众包高质量的标签
Qi Li, Fenglong Ma, Jing Gao, Lu Su, Christopher J. Quinn
{"title":"Crowdsourcing High Quality Labels with a Tight Budget","authors":"Qi Li, Fenglong Ma, Jing Gao, Lu Su, Christopher J. Quinn","doi":"10.1145/2835776.2835797","DOIUrl":"https://doi.org/10.1145/2835776.2835797","url":null,"abstract":"In the past decade, commercial crowdsourcing platforms have revolutionized the ways of classifying and annotating data, especially for large datasets. Obtaining labels for a single instance can be inexpensive, but for large datasets, it is important to allocate budgets wisely. With limited budgets, requesters must trade-off between the quantity of labeled instances and the quality of the final results. Existing budget allocation methods can achieve good quantity but cannot guarantee high quality of individual instances under a tight budget. However, in some scenarios, requesters may be willing to label fewer instances but of higher quality. Moreover, they may have different requirements on quality for different tasks. To address these challenges, we propose a flexible budget allocation framework called Requallo. Requallo allows requesters to set their specific requirements on the labeling quality and maximizes the number of labeled instances that achieve the quality requirement under a tight budget. The budget allocation problem is modeled as a Markov decision process and a sequential labeling policy is produced. The proposed policy greedily searches for the instance to query next as the one that can provide the maximum reward for the goal. The Requallo framework is further extended to consider worker reliability so that the budget can be better allocated. Experiments on two real-world crowdsourcing tasks as well as a simulated task demonstrate that when the budget is tight, the proposed Requallo framework outperforms existing state-of-the-art budget allocation methods from both quantity and quality aspects.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84581041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信