Proceedings of The Web Conference 2020最新文献

筛选
英文 中文
The Structure of Social Influence in Recommender Networks 推荐网络中的社会影响结构
Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380020
P. Analytis, D. Barkoczi, Philipp Lorenz-Spreen, Stefan M. Herzog
{"title":"The Structure of Social Influence in Recommender Networks","authors":"P. Analytis, D. Barkoczi, Philipp Lorenz-Spreen, Stefan M. Herzog","doi":"10.1145/3366423.3380020","DOIUrl":"https://doi.org/10.1145/3366423.3380020","url":null,"abstract":"People’s ability to influence others’ opinion on matters of taste varies greatly—both offline and in recommender systems. What are the mechanisms underlying these striking differences? Using the weighted k-nearest neighbors algorithm (k-nn) to represent an array of social learning strategies, we show—leveraging methods from network science—how the k-nn algorithm gives rise to networks of social influence in six real-world domains of taste. We show three novel results that apply both to offline advice taking and online recommender settings. First, influential individuals have mainstream tastes and high dispersion in their taste similarity with others. Second, the fewer people an individual or algorithm consults (i.e., the lower k is) or the larger the weight placed on the opinions of more similar others, the smaller the group of people with substantial influence. Third, the influence networks emerging from deploying the k-nn algorithm are hierarchically organized. Our results shed new light on classic empirical findings in communication and network science and can help improve the understanding of social influence offline and online.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89197786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
LOVBench: Ontology Ranking Benchmark LOVBench:本体排名基准
Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380245
Niklas Kolbe, P. Vandenbussche, S. Kubler, Yves Le Traon
{"title":"LOVBench: Ontology Ranking Benchmark","authors":"Niklas Kolbe, P. Vandenbussche, S. Kubler, Yves Le Traon","doi":"10.1145/3366423.3380245","DOIUrl":"https://doi.org/10.1145/3366423.3380245","url":null,"abstract":"Ontology search and ranking are key building blocks to establish and reuse shared conceptualizations of domain knowledge on the Web. However, the effectiveness of proposed ontology ranking models is difficult to compare since these are often evaluated on diverse datasets that are limited by their static nature and scale. In this paper, we first introduce the LOVBench dataset as a benchmark for ontology term ranking. With inferred relevance judgments for more than 7000 queries, LOVBench is large enough to perform a comparison study using learning to rank (LTR) with complex ontology ranking models. Instead of relying on relevance judgments from a few experts, we consider implicit feedback from many actual users collected from the Linked Open Vocabularies (LOV) platform. Our approach further enables continuous updates of the benchmark, capturing the evolution of ontologies’ relevance in an ever-changing data community. Second, we compare the performance of several feature configurations from the literature using LOVBench in LTR settings and discuss the results in the context of the observed real-world user behavior. Our experimental results show that feature configurations which are (i) well-suited to the user behavior, (ii) cover all features types, and (iii) consider decomposition of features can significantly improve the ranking performance.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89604931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Twitter User Location Inference Based on Representation Learning and Label Propagation 基于表示学习和标签传播的Twitter用户位置推断
Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380019
Hechan Tian, Meng Zhang, Xiangyang Luo, Fenlin Liu, Yaqiong Qiao
{"title":"Twitter User Location Inference Based on Representation Learning and Label Propagation","authors":"Hechan Tian, Meng Zhang, Xiangyang Luo, Fenlin Liu, Yaqiong Qiao","doi":"10.1145/3366423.3380019","DOIUrl":"https://doi.org/10.1145/3366423.3380019","url":null,"abstract":"Social network user location inference technology has been widely used in various geospatial applications like public health monitoring and local advertising recommendation. Due to insufficient consideration of relationships between users and location indicative words, most of existing inference methods estimate label propagation probabilities solely based on statistical features, resulting in large location inference error. In this paper, a Twitter user location inference method based on representation learning and label propagation is proposed. Firstly, the heterogeneous connection relation graph is constructed based on relationships between Twitter users and relationships between users and location indicative words, and relationships unrelated to geographic attributes are filtered. Then, vector representations of users are learnt from the connection relation graph. Finally, label propagation probabilities between adjacent users are calculated based on vector representations, and the locations of unknown users are predicted through iterative label propagation. Experiments on two representative Twitter datasets - GeoText and TwUs, show that the proposed method can accurately calculate label propagation probabilities based on vector representations and improve the accuracy of location inference. Compared with existing typical Twitter user location inference methods - GCN and MLP-TXT+NET, the median error distance of the proposed method is reduced by 18% and 16%, respectively.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85412856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Leveraging Passage-level Cumulative Gain for Document Ranking 利用段落级累积增益进行文档排序
Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380305
Zhijing Wu, Jiaxin Mao, Yiqun Liu, Jingtao Zhan, Yukun Zheng, Min Zhang, Shaoping Ma
{"title":"Leveraging Passage-level Cumulative Gain for Document Ranking","authors":"Zhijing Wu, Jiaxin Mao, Yiqun Liu, Jingtao Zhan, Yukun Zheng, Min Zhang, Shaoping Ma","doi":"10.1145/3366423.3380305","DOIUrl":"https://doi.org/10.1145/3366423.3380305","url":null,"abstract":"Document ranking is one of the most studied but challenging problems in information retrieval (IR) research. A number of existing document ranking models capture relevance signals at the whole document level. Recently, more and more research has begun to address this problem from fine-grained document modeling. Several works leveraged fine-grained passage-level relevance signals in ranking models. However, most of these works focus on context-independent passage-level relevance signals and ignore the context information, which may lead to inaccurate estimation of passage-level relevance. In this paper, we investigate how information gain accumulates with passages when users sequentially read a document. We propose the context-aware Passage-level Cumulative Gain (PCG), which aggregates relevance scores of passages and avoids the need to formally split a document into independent passages. Next, we incorporate the patterns of PCG into a BERT-based sequential model called Passage-level Cumulative Gain Model (PCGM) to predict the PCG sequence. Finally, we apply PCGM to the document ranking task. Experimental results on two public ad hoc retrieval benchmark datasets show that PCGM outperforms most existing ranking models and also indicates the effectiveness of PCG signals. We believe that this work contributes to improving ranking performance and providing more explainability for document ranking.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87705009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Scaling PageRank to 100 Billion Pages 将PageRank扩展到1000亿页
Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380035
S. Stergiou
{"title":"Scaling PageRank to 100 Billion Pages","authors":"S. Stergiou","doi":"10.1145/3366423.3380035","DOIUrl":"https://doi.org/10.1145/3366423.3380035","url":null,"abstract":"Distributed graph processing frameworks formulate tasks as sequences of supersteps within which communication is performed asynchronously by sending messages over the graph edges. PageRank’s communication pattern is identical across all its supersteps since each vertex sends messages to all its edges. We exploit this pattern to develop a new communication paradigm that allows us to exchange messages that include only edge payloads, dramatically reducing bandwidth requirements. Experiments on a web graph of 38 billion vertices and 3.1 trillion edges yield execution times of 34.4 seconds per iteration, suggesting more than an order of magnitude improvement over the state-of-the-art.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82794592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Metric Learning with Equidistant and Equidistributed Triplet-based Loss for Product Image Search 基于等距等分布三元损失的度量学习在产品图像搜索中的应用
Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380094
Furong Xu, Wei Zhang, Yuan Cheng, Wei Chu
{"title":"Metric Learning with Equidistant and Equidistributed Triplet-based Loss for Product Image Search","authors":"Furong Xu, Wei Zhang, Yuan Cheng, Wei Chu","doi":"10.1145/3366423.3380094","DOIUrl":"https://doi.org/10.1145/3366423.3380094","url":null,"abstract":"Product image search in E-commerce systems is a challenging task, because of a huge number of product classes, low intra-class similarity and high inter-class similarity. Deep metric learning, based on paired distances independent of the number of classes, aims to minimize intra-class variances and inter-class similarity in feature embedding space. Most existing approaches strictly restrict the distance between samples with fixed values to distinguish different classes of samples. However, the distance of paired samples has various magnitudes during different training stages. Therefore, it is difficult to directly restrict absolute distances with fixed values. In this paper, we propose a novel Equidistant and Equidistributed Triplet-based (EET) loss function to adjust the distance between samples with relative distance constraints. By optimizing the loss function, the algorithm progressively maximizes intra-class similarity and inter-class variances. Specifically, 1) the equidistant loss pulls the matched samples closer by adaptively constraining two samples of the same class to be equally distant from another one of a different class in each triplet, 2) the equidistributed loss pushes the mismatched samples farther away by guiding different classes to be uniformly distributed while keeping intra-class structure compact in embedding space. Extensive experimental results on product search benchmarks verify the improved performance of our method. We also achieve improvements on other retrieval datasets, which show superior generalization capacity of our method in image search.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79952749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Adversarial Bandits Policy for Crawling Commercial Web Content 抓取商业Web内容的对抗性盗匪策略
Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380125
Shuguang Han, Michael Bendersky, Przemek Gajda, Sergey Novikov, Marc Najork, Bernhard Brodowsky, Alexandrin Popescul
{"title":"Adversarial Bandits Policy for Crawling Commercial Web Content","authors":"Shuguang Han, Michael Bendersky, Przemek Gajda, Sergey Novikov, Marc Najork, Bernhard Brodowsky, Alexandrin Popescul","doi":"10.1145/3366423.3380125","DOIUrl":"https://doi.org/10.1145/3366423.3380125","url":null,"abstract":"The rapid growth of commercial web content has driven the development of shopping search services to help users find product offers. Due to the dynamic nature of commercial content, an effective recrawl policy is a key component in a shopping search service; it ensures that users have access to the up-to-date product details. Most of the existing strategies either relied on simple heuristics, or overlooked the resource budgets. To address this, Azar et al. [5] recently proposed an optimization strategy LambdaCrawl aiming to maximize content freshness within a given resource budget. In this paper, we demonstrate that the effectiveness of LambdaCrawl is governed in large part by how well future content change rate can be estimated. By adopting the state-of-the-art deep learning models for change rate prediction, we obtain a substantial increase of content freshness over the common LambdaCrawl implementation with change rate estimated from the past history. Moreover, we demonstrate that while LambdaCrawl is a significant advancement upon existing recrawl strategies, it can be further improved upon by a unified multi-strategy recrawl policy. To this end, we adopt the K-armed adversarial bandits algorithm that can provably optimize the overall freshness by combining multiple strategies. Empirical results over a large-scale production dataset confirm its superiority to LambdaCrawl, especially under tight resource budgets.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"148 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89072482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Deconstruct Densest Subgraphs 解构最密集子图
Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380033
Lijun Chang, Miao Qiao
{"title":"Deconstruct Densest Subgraphs","authors":"Lijun Chang, Miao Qiao","doi":"10.1145/3366423.3380033","DOIUrl":"https://doi.org/10.1145/3366423.3380033","url":null,"abstract":"In this paper, we aim to understand the distribution of the densest subgraphs of a given graph under the density notion of average-degree. We show that the structures, the relationships and the distributions of all the densest subgraphs of a graph G can be encoded in O(L) space in an index called the ds-Index. Here L denotes the maximum output size of a densest subgraph of G. More importantly, ds-Indexcan report all the minimal densest subgraphs of G collectively in O(L) time and can enumerate all the densest subgraphs of G with an O(L) delay. Besides, the construction of ds-Indexcosts no more than finding a single densest subgraph using the state-of-the-art approach. Our empirical study shows that for web-scale graphs with one billion edges, the ds-Indexcan be constructed in several minutes on an ordinary commercial machine.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79091010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Dark Matter: Uncovering the DarkComet RAT Ecosystem 暗物质:揭示暗彗星鼠生态系统
Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380277
Brown Farinholt, Mohammad Rezaeirad, Damon McCoy, Kirill Levchenko
{"title":"Dark Matter: Uncovering the DarkComet RAT Ecosystem","authors":"Brown Farinholt, Mohammad Rezaeirad, Damon McCoy, Kirill Levchenko","doi":"10.1145/3366423.3380277","DOIUrl":"https://doi.org/10.1145/3366423.3380277","url":null,"abstract":"Remote Access Trojans (RATs) are a persistent class of malware that give an attacker direct, interactive access to a victim’s personal computer, allowing the attacker to steal private data, spy on the victim in real-time using the camera and microphone, and verbally harass the victim through the speaker. To date, the users and victims of this pernicious form of malware have been challenging to observe in the wild due to the unobtrusive nature of infections. In this work, we report the results of a longitudinal study of the DarkComet RAT ecosystem. Using a known method for collecting victim log databases from DarkComet controllers, we present novel techniques for tracking RAT controllers across hostname changes and improve on established techniques for filtering spurious victim records caused by scanners and sandboxed malware executions. We downloaded 6,620 DarkComet databases from 1,029 unique controllers spanning over 5 years of operation. Our analysis shows that there have been at least 57,805 victims of DarkComet over this period, with 69 new victims infected every day; many of whose keystrokes have been captured, actions recorded, and webcams monitored during this time. Our methodologies for more precisely identifying campaigns and victims could potentially be useful for improving the efficiency and efficacy of victim cleanup efforts and prioritization of law enforcement investigations.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73131862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Extracting Knowledge from Web Text with Monte Carlo Tree Search 用蒙特卡罗树搜索从网络文本中提取知识
Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380010
Guiliang Liu, Xu Li, Jiakang Wang, Mingming Sun, P. Li
{"title":"Extracting Knowledge from Web Text with Monte Carlo Tree Search","authors":"Guiliang Liu, Xu Li, Jiakang Wang, Mingming Sun, P. Li","doi":"10.1145/3366423.3380010","DOIUrl":"https://doi.org/10.1145/3366423.3380010","url":null,"abstract":"To extract knowledge from general web text, it requires to build a domain-independent extractor that scales to the entire web corpus. This task is known as Open Information Extraction (OIE). This paper proposes to apply Monte-Carlo Tree Search (MCTS) to accomplish OIE. To achieve this goal, we define a Markov Decision Process for OIE and build a simulator to learn the reward signals, which provides a complete reinforcement learning framework for MCTS. Using this framework, MCTS explores candidate words (and symbols) under the guidance of a pre-trained Sequence-to-Sequence (Seq2Seq) predictor and generates abundant exploration samples during training. We apply the exploration samples to update the reward simulator and the predictor, based on which we implement another MCTS to search the optimal predictions during inference. Empirical evaluation demonstrates that the MCTS inference substantially improves the accuracy of prediction (more than 10%) and achieves a leading performance over other state-of-the-art comparison models.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74328538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信