Proceedings of The Web Conference 2020最新文献_第4页

The Structure of Social Influence in Recommender Networks 推荐网络中的社会影响结构

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380020

P. Analytis, D. Barkoczi, Philipp Lorenz-Spreen, Stefan M. Herzog

{"title":"The Structure of Social Influence in Recommender Networks","authors":"P. Analytis, D. Barkoczi, Philipp Lorenz-Spreen, Stefan M. Herzog","doi":"10.1145/3366423.3380020","DOIUrl":"https://doi.org/10.1145/3366423.3380020","url":null,"abstract":"People’s ability to influence others’ opinion on matters of taste varies greatly—both offline and in recommender systems. What are the mechanisms underlying these striking differences? Using the weighted k-nearest neighbors algorithm (k-nn) to represent an array of social learning strategies, we show—leveraging methods from network science—how the k-nn algorithm gives rise to networks of social influence in six real-world domains of taste. We show three novel results that apply both to offline advice taking and online recommender settings. First, influential individuals have mainstream tastes and high dispersion in their taste similarity with others. Second, the fewer people an individual or algorithm consults (i.e., the lower k is) or the larger the weight placed on the opinions of more similar others, the smaller the group of people with substantial influence. Third, the influence networks emerging from deploying the k-nn algorithm are hierarchically organized. Our results shed new light on classic empirical findings in communication and network science and can help improve the understanding of social influence offline and online.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89197786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

LOVBench: Ontology Ranking Benchmark LOVBench:本体排名基准

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380245

Niklas Kolbe, P. Vandenbussche, S. Kubler, Yves Le Traon

{"title":"LOVBench: Ontology Ranking Benchmark","authors":"Niklas Kolbe, P. Vandenbussche, S. Kubler, Yves Le Traon","doi":"10.1145/3366423.3380245","DOIUrl":"https://doi.org/10.1145/3366423.3380245","url":null,"abstract":"Ontology search and ranking are key building blocks to establish and reuse shared conceptualizations of domain knowledge on the Web. However, the effectiveness of proposed ontology ranking models is difficult to compare since these are often evaluated on diverse datasets that are limited by their static nature and scale. In this paper, we first introduce the LOVBench dataset as a benchmark for ontology term ranking. With inferred relevance judgments for more than 7000 queries, LOVBench is large enough to perform a comparison study using learning to rank (LTR) with complex ontology ranking models. Instead of relying on relevance judgments from a few experts, we consider implicit feedback from many actual users collected from the Linked Open Vocabularies (LOV) platform. Our approach further enables continuous updates of the benchmark, capturing the evolution of ontologies’ relevance in an ever-changing data community. Second, we compare the performance of several feature configurations from the literature using LOVBench in LTR settings and discuss the results in the context of the observed real-world user behavior. Our experimental results show that feature configurations which are (i) well-suited to the user behavior, (ii) cover all features types, and (iii) consider decomposition of features can significantly improve the ranking performance.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89604931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Twitter User Location Inference Based on Representation Learning and Label Propagation 基于表示学习和标签传播的Twitter用户位置推断

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380019

Hechan Tian, Meng Zhang, Xiangyang Luo, Fenlin Liu, Yaqiong Qiao

{"title":"Twitter User Location Inference Based on Representation Learning and Label Propagation","authors":"Hechan Tian, Meng Zhang, Xiangyang Luo, Fenlin Liu, Yaqiong Qiao","doi":"10.1145/3366423.3380019","DOIUrl":"https://doi.org/10.1145/3366423.3380019","url":null,"abstract":"Social network user location inference technology has been widely used in various geospatial applications like public health monitoring and local advertising recommendation. Due to insufficient consideration of relationships between users and location indicative words, most of existing inference methods estimate label propagation probabilities solely based on statistical features, resulting in large location inference error. In this paper, a Twitter user location inference method based on representation learning and label propagation is proposed. Firstly, the heterogeneous connection relation graph is constructed based on relationships between Twitter users and relationships between users and location indicative words, and relationships unrelated to geographic attributes are filtered. Then, vector representations of users are learnt from the connection relation graph. Finally, label propagation probabilities between adjacent users are calculated based on vector representations, and the locations of unknown users are predicted through iterative label propagation. Experiments on two representative Twitter datasets - GeoText and TwUs, show that the proposed method can accurately calculate label propagation probabilities based on vector representations and improve the accuracy of location inference. Compared with existing typical Twitter user location inference methods - GCN and MLP-TXT+NET, the median error distance of the proposed method is reduced by 18% and 16%, respectively.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85412856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Leveraging Passage-level Cumulative Gain for Document Ranking 利用段落级累积增益进行文档排序

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380305

Zhijing Wu, Jiaxin Mao, Yiqun Liu, Jingtao Zhan, Yukun Zheng, Min Zhang, Shaoping Ma

{"title":"Leveraging Passage-level Cumulative Gain for Document Ranking","authors":"Zhijing Wu, Jiaxin Mao, Yiqun Liu, Jingtao Zhan, Yukun Zheng, Min Zhang, Shaoping Ma","doi":"10.1145/3366423.3380305","DOIUrl":"https://doi.org/10.1145/3366423.3380305","url":null,"abstract":"Document ranking is one of the most studied but challenging problems in information retrieval (IR) research. A number of existing document ranking models capture relevance signals at the whole document level. Recently, more and more research has begun to address this problem from fine-grained document modeling. Several works leveraged fine-grained passage-level relevance signals in ranking models. However, most of these works focus on context-independent passage-level relevance signals and ignore the context information, which may lead to inaccurate estimation of passage-level relevance. In this paper, we investigate how information gain accumulates with passages when users sequentially read a document. We propose the context-aware Passage-level Cumulative Gain (PCG), which aggregates relevance scores of passages and avoids the need to formally split a document into independent passages. Next, we incorporate the patterns of PCG into a BERT-based sequential model called Passage-level Cumulative Gain Model (PCGM) to predict the PCG sequence. Finally, we apply PCGM to the document ranking task. Experimental results on two public ad hoc retrieval benchmark datasets show that PCGM outperforms most existing ranking models and also indicates the effectiveness of PCG signals. We believe that this work contributes to improving ranking performance and providing more explainability for document ranking.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87705009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Scaling PageRank to 100 Billion Pages 将PageRank扩展到1000亿页

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380035

S. Stergiou

引用次数: 4

Metric Learning with Equidistant and Equidistributed Triplet-based Loss for Product Image Search 基于等距等分布三元损失的度量学习在产品图像搜索中的应用

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380094

Furong Xu, Wei Zhang, Yuan Cheng, Wei Chu

{"title":"Metric Learning with Equidistant and Equidistributed Triplet-based Loss for Product Image Search","authors":"Furong Xu, Wei Zhang, Yuan Cheng, Wei Chu","doi":"10.1145/3366423.3380094","DOIUrl":"https://doi.org/10.1145/3366423.3380094","url":null,"abstract":"Product image search in E-commerce systems is a challenging task, because of a huge number of product classes, low intra-class similarity and high inter-class similarity. Deep metric learning, based on paired distances independent of the number of classes, aims to minimize intra-class variances and inter-class similarity in feature embedding space. Most existing approaches strictly restrict the distance between samples with fixed values to distinguish different classes of samples. However, the distance of paired samples has various magnitudes during different training stages. Therefore, it is difficult to directly restrict absolute distances with fixed values. In this paper, we propose a novel Equidistant and Equidistributed Triplet-based (EET) loss function to adjust the distance between samples with relative distance constraints. By optimizing the loss function, the algorithm progressively maximizes intra-class similarity and inter-class variances. Specifically, 1) the equidistant loss pulls the matched samples closer by adaptively constraining two samples of the same class to be equally distant from another one of a different class in each triplet, 2) the equidistributed loss pushes the mismatched samples farther away by guiding different classes to be uniformly distributed while keeping intra-class structure compact in embedding space. Extensive experimental results on product search benchmarks verify the improved performance of our method. We also achieve improvements on other retrieval datasets, which show superior generalization capacity of our method in image search.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79952749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Adversarial Bandits Policy for Crawling Commercial Web Content 抓取商业Web内容的对抗性盗匪策略

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380125

Shuguang Han, Michael Bendersky, Przemek Gajda, Sergey Novikov, Marc Najork, Bernhard Brodowsky, Alexandrin Popescul

{"title":"Adversarial Bandits Policy for Crawling Commercial Web Content","authors":"Shuguang Han, Michael Bendersky, Przemek Gajda, Sergey Novikov, Marc Najork, Bernhard Brodowsky, Alexandrin Popescul","doi":"10.1145/3366423.3380125","DOIUrl":"https://doi.org/10.1145/3366423.3380125","url":null,"abstract":"The rapid growth of commercial web content has driven the development of shopping search services to help users find product offers. Due to the dynamic nature of commercial content, an effective recrawl policy is a key component in a shopping search service; it ensures that users have access to the up-to-date product details. Most of the existing strategies either relied on simple heuristics, or overlooked the resource budgets. To address this, Azar et al. [5] recently proposed an optimization strategy LambdaCrawl aiming to maximize content freshness within a given resource budget. In this paper, we demonstrate that the effectiveness of LambdaCrawl is governed in large part by how well future content change rate can be estimated. By adopting the state-of-the-art deep learning models for change rate prediction, we obtain a substantial increase of content freshness over the common LambdaCrawl implementation with change rate estimated from the past history. Moreover, we demonstrate that while LambdaCrawl is a significant advancement upon existing recrawl strategies, it can be further improved upon by a unified multi-strategy recrawl policy. To this end, we adopt the K-armed adversarial bandits algorithm that can provably optimize the overall freshness by combining multiple strategies. Empirical results over a large-scale production dataset confirm its superiority to LambdaCrawl, especially under tight resource budgets.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"148 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89072482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Deconstruct Densest Subgraphs 解构最密集子图

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380033

Lijun Chang, Miao Qiao

引用次数: 7

Dark Matter: Uncovering the DarkComet RAT Ecosystem 暗物质:揭示暗彗星鼠生态系统

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380277

Brown Farinholt, Mohammad Rezaeirad, Damon McCoy, Kirill Levchenko

{"title":"Dark Matter: Uncovering the DarkComet RAT Ecosystem","authors":"Brown Farinholt, Mohammad Rezaeirad, Damon McCoy, Kirill Levchenko","doi":"10.1145/3366423.3380277","DOIUrl":"https://doi.org/10.1145/3366423.3380277","url":null,"abstract":"Remote Access Trojans (RATs) are a persistent class of malware that give an attacker direct, interactive access to a victim’s personal computer, allowing the attacker to steal private data, spy on the victim in real-time using the camera and microphone, and verbally harass the victim through the speaker. To date, the users and victims of this pernicious form of malware have been challenging to observe in the wild due to the unobtrusive nature of infections. In this work, we report the results of a longitudinal study of the DarkComet RAT ecosystem. Using a known method for collecting victim log databases from DarkComet controllers, we present novel techniques for tracking RAT controllers across hostname changes and improve on established techniques for filtering spurious victim records caused by scanners and sandboxed malware executions. We downloaded 6,620 DarkComet databases from 1,029 unique controllers spanning over 5 years of operation. Our analysis shows that there have been at least 57,805 victims of DarkComet over this period, with 69 new victims infected every day; many of whose keystrokes have been captured, actions recorded, and webcams monitored during this time. Our methodologies for more precisely identifying campaigns and victims could potentially be useful for improving the efficiency and efficacy of victim cleanup efforts and prioritization of law enforcement investigations.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73131862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Extracting Knowledge from Web Text with Monte Carlo Tree Search 用蒙特卡罗树搜索从网络文本中提取知识

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380010

Guiliang Liu, Xu Li, Jiakang Wang, Mingming Sun, P. Li

引用次数: 24