The World Wide Web Conference最新文献_第9页

Learning Travel Time Distributions with Deep Generative Model 用深度生成模型学习旅行时间分布

The World Wide Web Conference Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313418

Xiucheng Li, G. Cong, Aixin Sun, Yun Cheng

{"title":"Learning Travel Time Distributions with Deep Generative Model","authors":"Xiucheng Li, G. Cong, Aixin Sun, Yun Cheng","doi":"10.1145/3308558.3313418","DOIUrl":"https://doi.org/10.1145/3308558.3313418","url":null,"abstract":"Travel time estimation of a given route with respect to real-time traffic condition is extremely useful for many applications like route planning. We argue that it is even more useful to estimate the travel time distribution, from which we can derive the expected travel time as well as the uncertainty. In this paper, we develop a deep generative model - DeepGTT - to learn the travel time distribution for any route by conditioning on the real-time traffic. DeepGTT interprets the generation of travel time using a three-layer hierarchical probabilistic model. In the first layer, we present two techniques, amortization and spatial smoothness embeddings, to share statistical strength among different road segments; a convolutional neural net based representation learning component is also proposed to capture the dynamically changing real-time traffic condition. In the middle layer, a nonlinear factorization model is developed to generate auxiliary random variable i.e., speed. The introduction of this middle layer separates the statical spatial features from the dynamically changing real-time traffic conditions, allowing us to incorporate the heterogeneous influencing factors into a single model. In the last layer, an attention mechanism based function is proposed to collectively generate the observed travel time. DeepGTT describes the generation process in a reasonable manner, and thus it not only produces more accurate results but also is more efficient. On a real-world large-scale data set, we show that DeepGTT produces substantially better results than state-of-the-art alternatives in two tasks: travel time estimation and route recovery from sparse trajectory data.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90353418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 54

Externalities and Fairness 外部性与公平性

The World Wide Web Conference Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313670

Masoud Seddighin, Hamed Saleh, M. Ghodsi

{"title":"Externalities and Fairness","authors":"Masoud Seddighin, Hamed Saleh, M. Ghodsi","doi":"10.1145/3308558.3313670","DOIUrl":"https://doi.org/10.1145/3308558.3313670","url":null,"abstract":"One of the important yet insufficiently studied subjects in fair allocation is the externality effect among agents. For a resource allocation problem, externalities imply that the share allocated to an agent may affect the utilities of other agents. In this paper, we conduct a study of fair allocation of indivisible goods when the externalities are not negligible. Inspired by the models in the context of network diffusion, we present a simple and natural model, namely network externalities, to capture the externalities. To evaluate fairness in the network externalities model, we generalize the idea behind the notion of maximin-share () to achieve a new criterion, namely, extended-maximin-share (). Next, we consider two problems concerning our model. First, we discuss the computational aspects of finding the value of for every agent. For this, we introduce a generalized form of partitioning problem that includes many famous partitioning problems such as maximin, minimax, and leximin. We further show that a 1/2-approximation algorithm exists for this partitioning problem. Next, we investigate on finding approximately optimal allocations, i.e., allocations that guarantee each agent a utility of at least a fraction of his extended-maximin-share. We show that under a natural assumption that the agents are a-self-reliant, an a/2- allocation always exists. The combination of this with the former result yields a polynomial-time a/4- allocation algorithm.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"53 72 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90374289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Graph-based Interactive Data Federation System for Heterogeneous Data Retrieval and Analytics 基于图的异构数据检索与分析交互式数据联合系统

The World Wide Web Conference Pub Date : 2019-05-13 DOI: 10.1145/3308558.3314138

Xuan-Son Vu, Addi Ait-Mlouk, E. Elmroth, Lili Jiang

{"title":"Graph-based Interactive Data Federation System for Heterogeneous Data Retrieval and Analytics","authors":"Xuan-Son Vu, Addi Ait-Mlouk, E. Elmroth, Lili Jiang","doi":"10.1145/3308558.3314138","DOIUrl":"https://doi.org/10.1145/3308558.3314138","url":null,"abstract":"Given the increasing number of heterogeneous data stored in relational databases, file systems or cloud environment, it needs to be easily accessed and semantically connected for further data analytic. The potential of data federation is largely untapped, this paper presents an interactive data federation system (https://vimeo.com/319473546) by applying large-scale techniques including heterogeneous data federation, natural language processing, association rules and semantic web to perform data retrieval and analytics on social network data. The system first creates a Virtual Database (VDB) to virtually integrate data from multiple data sources. Next, a RDF generator is built to unify data, together with SPARQL queries, to support semantic data search over the processed text data by natural language processing (NLP). Association rule analysis is used to discover the patterns and recognize the most important co-occurrences of variables from multiple data sources. The system demonstrates how it facilitates interactive data analytic towards different application scenarios (e.g., sentiment analysis, privacy-concern analysis, community detection).","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90579829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Enriching News Articles with Related Search Queries 丰富新闻文章与相关的搜索查询

The World Wide Web Conference Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313588

David Carmel, Yaroslav Fyodorov, Saar Kuzi, Avihai Mejer, Fiana Raiber, Elad Rainshmidt

引用次数: 1

Genre Differences of Song Lyrics and Artist Wikis: An Analysis of Popularity, Length, Repetitiveness, and Readability 歌曲歌词与艺术家维基的体裁差异:流行度、长度、重复度与可读性分析

The World Wide Web Conference Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313604

M. Schedl

{"title":"Genre Differences of Song Lyrics and Artist Wikis: An Analysis of Popularity, Length, Repetitiveness, and Readability","authors":"M. Schedl","doi":"10.1145/3308558.3313604","DOIUrl":"https://doi.org/10.1145/3308558.3313604","url":null,"abstract":"Music is known to exhibit different characteristics, depending on genre and style. While most research that studies such differences takes a musicological perspective and analyzes acoustic properties of individual pieces or artists, we conduct a large-scale analysis using various web resources. Exploiting content information from song lyrics, contextual information reflected in music artists' Wikipedia articles, and listening information, we particularly study the aspects of popularity, length, repetitiveness, and readability of lyrics and Wikipedia articles. We measure popularity in terms of song play count (PC) and listener count (LC), length in terms of character and word count, repetitiveness in terms of text compression ratio, and readability in terms of the Simple Measure of Gobbledygook (SMOG). Extending datasets of music listening histories and genre annotations from Last.fm, we extract and analyze 424,476 song lyrics by 18,724 artists from LyricWiki. We set out to answer whether there exist significant genre differences in song lyrics (RQ1) and artist Wikipedia articles (RQ2) in terms of repetitiveness and readability. We also assess whether we can find evidence to support the cliche´ that lyrics of very popular artists are particularly simple and repetitive (RQ3). We further investigate whether the characteristics of popularity, length, repetitiveness, and readability correlate within and between lyrics and Wikipedia articles (RQ4). We identify substantial differences in repetitiveness and readability of lyrics between music genres. In contrast, no significant differences between genres are found for artists' Wikipedia pages. Also, we find that lyrics of highly popular artists are repetitive but not necessarily simple in terms of readability. Furthermore, we uncover weak correlations between length of lyrics and of Wikipedia pages of the same artist, weak correlations between lyrics' reading difficulty and their length, and moderate correlations between artists' popularity and length of their lyrics.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85283158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

RED: Redundancy-Driven Data Extraction from Result Pages? RED:从结果页中提取冗余驱动的数据?

The World Wide Web Conference Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313529

Jinsong Guo, Valter Crescenzi, Tim Furche, G. Grasso, G. Gottlob

{"title":"RED: Redundancy-Driven Data Extraction from Result Pages?","authors":"Jinsong Guo, Valter Crescenzi, Tim Furche, G. Grasso, G. Gottlob","doi":"10.1145/3308558.3313529","DOIUrl":"https://doi.org/10.1145/3308558.3313529","url":null,"abstract":"Data-driven websites are mostly accessed through search interfaces. Such sites follow a common publishing pattern that, surprisingly, has not been fully exploited for unsupervised data extraction yet: the result of a search is presented as a paginated list of result records. Each result record contains the main attributes about one single object, and links to a page dedicated to the details of that object. We present red, an automatic approach and a prototype system for extracting data records from sites following this publishing pattern. red leverages the inherent redundancy between result records and corresponding detail pages to design an effective, yet fully-unsupervised and domain-independent method. It is able to extract from result pages all the attributes of the objects that appear both in the result records and in the corresponding detail pages. With respect to previous unsupervised methods, our method does not require any a priori domain-dependent knowledge (e.g, an ontology), can achieve a significantly higher accuracy while automatically selecting only object attributes, a task which is out of the scope of traditional fully unsupervised approaches. With respect to previous supervised or semi-supervised methods, red can reach similar accuracy in many domains (e.g., job postings) without requiring supervision for each domain, let alone each website.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"88 4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84053858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Exploiting Diversity in Android TLS Implementations for Mobile App Traffic Classification 利用Android TLS实现的多样性实现移动应用流量分类

The World Wide Web Conference Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313738

Satadal Sengupta, Niloy Ganguly, Pradipta De, Sandip Chakraborty

{"title":"Exploiting Diversity in Android TLS Implementations for Mobile App Traffic Classification","authors":"Satadal Sengupta, Niloy Ganguly, Pradipta De, Sandip Chakraborty","doi":"10.1145/3308558.3313738","DOIUrl":"https://doi.org/10.1145/3308558.3313738","url":null,"abstract":"Network traffic classification is an important tool for network administrators in enabling monitoring and service provisioning. Traditional techniques employed in classifying traffic do not work well for mobile app traffic due to lack of unique signatures. Encryption renders this task even more difficult since packet content is no longer available to parse. More recent techniques based on statistical analysis of parameters such as packet-size and arrival time of packets have shown promise; such techniques have been shown to classify traffic from a small number of applications with a high degree of accuracy. However, we show that when employed to a large number of applications, the performance falls short of satisfactory. In this paper, we propose a novel set of bit-sequence based features which exploit differences in randomness of data generated by different applications. These differences originating due to dissimilarities in encryption implementations by different applications leave footprints on the data generated by them. We validate that these features can differentiate data encrypted with various ciphers (89% accuracy) and key-sizes (83% accuracy). Our evaluation shows that such features can not only differentiate traffic originating from different categories of mobile apps (90% accuracy), but can also classify 175 individual applications with 95% accuracy.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"65 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87964431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Outguard: Detecting In-Browser Covert Cryptocurrency Mining in the Wild Outguard:在野外检测浏览器内隐蔽的加密货币挖掘

The World Wide Web Conference Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313665

Amin Kharraz, Zane Ma, Paul Murley, Chaz Lever, Joshua Mason, Andrew K. Miller, N. Borisov, M. Antonakakis, Michael Bailey

引用次数: 63

Nameles: An intelligent system for Real-Time Filtering of Invalid Ad Traffic 名称:一个智能系统，实时过滤无效的广告流量

The World Wide Web Conference Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313601

Antonio Pastor, Matti Antero Parssinen, Patricia Callejo, Pelayo Vallina, R. C. Rumín, Ángel Cuevas, M. Kotila, A. Azcorra

{"title":"Nameles: An intelligent system for Real-Time Filtering of Invalid Ad Traffic","authors":"Antonio Pastor, Matti Antero Parssinen, Patricia Callejo, Pelayo Vallina, R. C. Rumín, Ángel Cuevas, M. Kotila, A. Azcorra","doi":"10.1145/3308558.3313601","DOIUrl":"https://doi.org/10.1145/3308558.3313601","url":null,"abstract":"Invalid ad traffic is an inherent problem of programmatic advertising that has not been properly addressed so far. Traditionally, it has been considered that invalid ad traffic only harms the interests of advertisers, which pay for the cost of invalid ad impressions while other industry stakeholders earn revenue through commissions regardless of the quality of the impression. Our first contribution consists of providing evidence that shows how the Demand Side Platforms (DSPs), one of the most important intermediaries in the programmatic advertising supply chain, may be suffering from economic losses due to invalid ad traffic. Addressing the problem of invalid traffic at DSPs requires a highly scalable solution that can identify invalid traffic in real time at the individual bid request level. The second and main contribution is the design and implementation of a solution for the invalid traffic problem, a system that can be seamlessly integrated into the current programmatic ecosystem by the DSPs. Our system has been released under an open source license, becoming the first auditable solution for invalid ad traffic detection. The intrinsic transparency of our solution along with the good results obtained in industrial trials have led the World Federation of Advertisers to endorse it.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90681472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Multi-Domain Gated CNN for Review Helpfulness Prediction 多域门控CNN评论帮助预测

The World Wide Web Conference Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313587

Cen Chen, Minghui Qiu, Yinfei Yang, Jun Zhou, Jun Huang, Xiaolong Li, F. S. Bao

引用次数: 32