{"title":"Detect Rumors on Twitter by Promoting Information Campaigns with Generative Adversarial Learning","authors":"Jing Ma, Wei Gao, Kam-Fai Wong","doi":"10.1145/3308558.3313741","DOIUrl":"https://doi.org/10.1145/3308558.3313741","url":null,"abstract":"Rumors can cause devastating consequences to individual and/or society. Analysis shows that widespread of rumors typically results from deliberately promoted information campaigns which aim to shape collective opinions on the concerned news events. In this paper, we attempt to fight such chaos with itself to make automatic rumor detection more robust and effective. Our idea is inspired by adversarial learning method originated from Generative Adversarial Networks (GAN). We propose a GAN-style approach, where a generator is designed to produce uncertain or conflicting voices, complicating the original conversational threads in order to pressurize the discriminator to learn stronger rumor indicative representations from the augmented, more challenging examples. Different from traditional data-driven approach to rumor detection, our method can capture low-frequency but stronger non-trivial patterns via such adversarial training. Extensive experiments on two Twitter benchmark datasets demonstrate that our rumor detection method achieves much better results than state-of-the-art methods.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81724508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fei Wu, Xiaoyuan Jing, Jun Zhou, Yi-mu Ji, Chao Lan, Qinghua Huang, Ruchuan Wang
{"title":"Semi-supervised Multi-view Individual and Sharable Feature Learning for Webpage Classification","authors":"Fei Wu, Xiaoyuan Jing, Jun Zhou, Yi-mu Ji, Chao Lan, Qinghua Huang, Ruchuan Wang","doi":"10.1145/3308558.3313492","DOIUrl":"https://doi.org/10.1145/3308558.3313492","url":null,"abstract":"Semi-supervised multi-view feature learning (SMFL) is a feasible solution for webpage classification. However, how to fully extract the complementarity and correlation information effectively under semi-supervised setting has not been well studied. In this paper, we propose a semi-supervised multi-view individual and sharable feature learning (SMISFL) approach, which jointly learns multiple view-individual transformations and one sharable transformation to explore the view-specific property for each view and the common property across views. We design a semi-supervised multi-view similarity preserving term, which fully utilizes the label information of labeled samples and similarity information of unlabeled samples from both intra-view and inter-view aspects. To promote learning of diversity, we impose a constraint on view-individual transformation to make the learned view-specific features to be statistically uncorrelated. Furthermore, we train a linear classifier, such that view-specific and shared features can be effectively combined for classification. Experiments on widely used webpage datasets demonstrate that SMISFL can significantly outperform state-of-the-art SMFL and webpage classification methods.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"98 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83604853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"To Return or to Explore: Modelling Human Mobility and Dynamics in Cyberspace","authors":"Tianran Hu, Yinglong Xia, Jiebo Luo","doi":"10.1145/3308558.3313686","DOIUrl":"https://doi.org/10.1145/3308558.3313686","url":null,"abstract":"With the wide adoption of multi-community structure in many popular online platforms, human mobility across online communities has drawn increasing attention from both academia and industry. In this work, we study the statistical patterns that characterize human movements in cyberspace. Inspired by previous work on human mobility in physical space, we decompose human online activities into return and exploration - two complementary types of movements. We then study how people perform these two movements, respectively. We first propose a preferential return model that uncovers the preferential properties of people returning to multiple online communities. Interestingly, this model echos the previous findings on human mobility in physical space. We then present a preferential exploration model that characterizes exploration movements from a novel online community-group perspective. Our experiments quantitatively reveal the patterns of people exploring new communities, which share striking similarities with online return movements in terms of underlying principles. By combining the mechanisms of both return and exploration together, we are able to obtain an overall model that characterizes human mobility patterns in cyberspace at the individual level. We further investigate human online activities using our models, and discover valuable insights on the mobility patterns across online communities. Our models explain the empirically observed human online movement trajectories remarkably well, and more importantly, sheds better light on the understanding of human cyberspace dynamics.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87201273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tarfah Alrashed, Chia-Jung Lee, P. Bailey, Christopher E. Lin, Milad Shokouhi, S. Dumais
{"title":"Evaluating User Actions as a Proxy for Email Significance","authors":"Tarfah Alrashed, Chia-Jung Lee, P. Bailey, Christopher E. Lin, Milad Shokouhi, S. Dumais","doi":"10.1145/3308558.3313624","DOIUrl":"https://doi.org/10.1145/3308558.3313624","url":null,"abstract":"Email remains a critical channel for communicating information in both personal and work accounts. The number of emails people receive every day can be overwhelming, which in turn creates challenges for efficient information management and consumption. Having a good estimate of the significance of emails forms the foundation for many downstream tasks (e.g. email prioritization); but determining significance at scale is expensive and challenging. In this work, we hypothesize that the cumulative set of actions on any individual email can be considered as a proxy for the perceived significance of that email. We propose two approaches to summarize observed actions on emails, which we then evaluate against the perceived significance. The first approach is a fixed-form utility function parameterized on a set of weights, and we study the impact of different weight assignment strategies. In the second approach, we build machine learning models to capture users' significance directly based on the observed actions. For evaluation, we collect human judgments on email significance for both personal and work emails. Our analysis suggests that there is a positive correlation between actions and significance of emails and that actions performed on personal and work emails are different. We also find that the degree of correlation varies across people, which may reflect the individualized nature of email activity patterns or significance. Subsequently, we develop an example of real-time email significance prediction by using action summaries as implicit feedback at scale. Evaluation results suggest that the resulting significance predictions have positive agreement with human assessments, albeit not at statistically strong levels. We speculate that we may require personalized significance prediction to improve agreement levels.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89728608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Rajiullah, Andra Lutu, Ali Safari Khatouni, Mah-Rukh Fida, M. Mellia, A. Brunström, Özgü Alay, Stefan Alfredsson, V. Mancuso
{"title":"Web Experience in Mobile Networks: Lessons from Two Million Page Visits","authors":"Mohammad Rajiullah, Andra Lutu, Ali Safari Khatouni, Mah-Rukh Fida, M. Mellia, A. Brunström, Özgü Alay, Stefan Alfredsson, V. Mancuso","doi":"10.1145/3308558.3313606","DOIUrl":"https://doi.org/10.1145/3308558.3313606","url":null,"abstract":"Measuring and characterizing web page performance is a challenging task. When it comes to the mobile world, the highly varying technology characteristics coupled with the opaque network configuration make it even more difficult. Aiming at reproducibility, we present a large scale empirical study of web page performance collected in eleven commercial mobile networks spanning four countries. By digging into measurement from nearly two million web browsing sessions, we shed light on the impact of different web protocols, browsers, and mobile technologies on the web performance. We find that the impact of mobile broadband access is sizeable. For example, the median page load time using mobile broadband increases by a third compared to wired access. Mobility clearly stresses the system, with handover causing the most evident performance penalties. Contrariwise, our measurements show that the adoption of HTTP/2 and QUIC has practically negligible impact. To understand the intertwining of all parameters, we adopt state-of-the-art statistical methods to identify the significance of different factors on the web performance. Our analysis confirms the importance of access technology and mobility context as well as webpage composition and browser. Our work highlights the importance of large-scale measurements. Even with our controlled setup, the complexity of the mobile web ecosystem is challenging to untangle. For this, we are releasing the dataset as open data for validation and further research.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"313 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77509876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CnGAN: Generative Adversarial Networks for Cross-network user preference generation for non-overlapped users","authors":"Dilruk Perera, Roger Zimmermann","doi":"10.1145/3308558.3313733","DOIUrl":"https://doi.org/10.1145/3308558.3313733","url":null,"abstract":"A major drawback of cross-network recommender solutions is that they can only be applied to users that are overlapped across networks. Thus, the non-overlapped users, which form the majority of users are ignored. As a solution, we propose CnGAN, a novel multi-task learning based, encoder-GAN-recommender architecture. The proposed model synthetically generates source network user preferences for non-overlapped users by learning the mapping from target to source network preference manifolds. The resultant user preferences are used in a Siamese network based neural recommender architecture. Furthermore, we propose a novel user-based pairwise loss function for recommendations using implicit interactions to better guide the generation process in the multi-task learning environment. We illustrate our solution by generating user preferences on the Twitter source network for recommendations on the YouTube target network. Extensive experiments show that the generated preferences can be used to improve recommendations for non-overlapped users. The resultant recommendations achieve superior performance compared to the state-of-the-art cross-network recommender solutions in terms of accuracy, novelty and diversity.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90101996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detection and Analysis of Self-Disclosure in Online News Commentaries","authors":"Prasanna Umar, A. Squicciarini, S. Rajtmajer","doi":"10.1145/3308558.3313669","DOIUrl":"https://doi.org/10.1145/3308558.3313669","url":null,"abstract":"Online users engage in self-disclosure - revealing personal information to others - in pursuit of social rewards. However, there are associated costs of disclosure to users' privacy. User profiling techniques support the use of contributed content for a number of purposes, e.g., micro-targeting advertisements. In this paper, we study self-disclosure as it occurs in newspaper comment forums. We explore a longitudinal dataset of about 60,000 comments on 2202 news articles from four major English news websites. We start with detection of language indicative of various types of self-disclosure, leveraging both syntactic and semantic information present in texts. Specifically, we use dependency parsing for subject, verb, and object extraction from sentences, in conjunction with named entity recognition to extract linguistic indicators of self-disclosure. We then use these indicators to examine the effects of anonymity and topic of discussion on self-disclosure. We find that anonymous users are more likely to self-disclose than identifiable users, and that self-disclosure varies across topics of discussion. Finally, we discuss the implications of our findings for user privacy.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79727223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sneha Mehta, Mohammad Raihanul Islam, H. Rangwala, Naren Ramakrishnan
{"title":"Event Detection using Hierarchical Multi-Aspect Attention","authors":"Sneha Mehta, Mohammad Raihanul Islam, H. Rangwala, Naren Ramakrishnan","doi":"10.1145/3308558.3313659","DOIUrl":"https://doi.org/10.1145/3308558.3313659","url":null,"abstract":"Classical event encoding and extraction methods rely on fixed dictionaries of keywords and templates or require ground truth labels for phrase/sentences. This hinders widespread application of information encoding approaches to large-scale free form (unstructured) text available on the web. Event encoding can be viewed as a hierarchical task where the coarser level task is event detection, i.e., identification of documents containing a specific event, and where the fine-grained task is one of event encoding, i.e., identifying key phrases, key sentences. Hierarchical models with attention seem like a natural choice for this problem, given their ability to differentially attend to more or less important features when constructing document representations. In this work we present a novel factorized bilinear multi-aspect attention mechanism (FBMA) that attends to different aspects of text while constructing its representation. We find that our approach outperforms state-of-the-art baselines for detecting civil unrest, military action, and non-state actor events from corpora in two different languages.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79492630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Mirian, Joe DeBlasio, S. Savage, G. Voelker, Kurt Thomas
{"title":"Hack for Hire: Exploring the Emerging Market for Account Hijacking","authors":"A. Mirian, Joe DeBlasio, S. Savage, G. Voelker, Kurt Thomas","doi":"10.1145/3308558.3313489","DOIUrl":"https://doi.org/10.1145/3308558.3313489","url":null,"abstract":"Email accounts represent an enticing target for attackers, both for the information they contain and the root of trust they provide to other connected web services. While defense-in-depth approaches such as phishing detection, risk analysis, and two-factor authentication help to stem large-scale hijackings, targeted attacks remain a potent threat due to the customization and effort involved. In this paper, we study a segment of targeted attackers known as “hack for hire” services to understand the playbook that attackers use to gain access to victim accounts. Posing as buyers, we interacted with 27 English, Russian, and Chinese blackmarket services, only five of which succeeded in attacking synthetic (though realistic) identities we controlled. Attackers primarily relied on tailored phishing messages, with enough sophistication to bypass SMS two-factor authentication. However, despite the ability to successfully deliver account access, the market exhibited low volume, poor customer service, and had multiple scammers. As such, we surmise that retail email hijacking has yet to mature to the level of other criminal market segments.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75090774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Intent to Book Metrics for Airbnb Search","authors":"B. Turnbull","doi":"10.1145/3308558.3313648","DOIUrl":"https://doi.org/10.1145/3308558.3313648","url":null,"abstract":"Airbnb is a two-sided rental marketplace offering a variety of unique and more traditional accommodation options. Similar to other online marketplaces we invest in optimizing the content surfaced on the search UI and ranking relevance to improve the guest online search experience. The unique Airbnb inventory, however, surfaces some major data challenges. Given the high stakes of booking less traditional accommodations, users can spend many days to weeks searching and scanning the description page of many accommodation ”listings” before making a decision to book. Moreover, much of the information about a listing is unstructured and can only be found by the user after they go through the details on the listing page. As a result, we have found traditional search metrics do not work well in the context of our platform. Basic metrics of single user actions, such as click-through-rates, number of listings viewed, or dwell time, are not consistently directionally correlated with our downstream business metrics. To address these issues we leverage machine learning to isolate signals of intent from rich behavioral data. These signals have key applications including analytical insights, ranking modeling inputs, and experimentation velocity. In this paper, we describe the development of a model-based user intent metric, ”intentful listing view”, which combines the signals of a variety of user micro-actions on the listing description page. We demonstrate this learned metric is directionally correlated with downstream conversion metrics and sensitive across a variety of historical search experiments.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"97 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80235621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}