Proceedings of The Web Conference 2020最新文献_第2页

TRAP: Two-level Regularized Autoencoder-based Embedding for Power-law Distributed Data 陷阱:基于两级正则化自编码器的幂律分布式数据嵌入

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380233

Dongmin Park, Hwanjun Song, Minseok Kim, Jae-Gil Lee

{"title":"TRAP: Two-level Regularized Autoencoder-based Embedding for Power-law Distributed Data","authors":"Dongmin Park, Hwanjun Song, Minseok Kim, Jae-Gil Lee","doi":"10.1145/3366423.3380233","DOIUrl":"https://doi.org/10.1145/3366423.3380233","url":null,"abstract":"Recently, autoencoder (AE)-based embedding approaches have achieved state-of-the-art performance in many tasks, especially in top-k recommendation with user embedding or node classification with node embedding. However, we find that many real-world data follow the power-law distribution with respect to the data object sparsity. When learning AE-based embeddings of these data, dense inputs move away from sparse inputs in an embedding space even when they are highly correlated. This phenomenon, which we call polarization, obviously distorts the embedding. In this paper, we propose TRAP that leverages two-level regularizers to effectively alleviate the polarization problem. The macroscopic regularizer generally prevents dense input objects from being distant from other sparse input objects, and the microscopic regularizer individually attracts each object to correlated neighbor objects rather than uncorrelated ones. Importantly, TRAP is a meta-algorithm that can be easily coupled with existing AE-based embedding methods with a simple modification. In extensive experiments on two representative embedding tasks using six-real world datasets, TRAP boosted the performance of the state-of-the-art algorithms by up to 31.53% and 94.99% respectively.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80507857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Beyond Rank-1: Discovering Rich Community Structure in Multi-Aspect Graphs 超越排名1:在多面向图中发现丰富的社区结构

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380129

Ekta Gujral, Ravdeep Pasricha, E. Papalexakis

{"title":"Beyond Rank-1: Discovering Rich Community Structure in Multi-Aspect Graphs","authors":"Ekta Gujral, Ravdeep Pasricha, E. Papalexakis","doi":"10.1145/3366423.3380129","DOIUrl":"https://doi.org/10.1145/3366423.3380129","url":null,"abstract":"How are communities in real multi-aspect or multi-view graphs structured? How we can effectively and concisely summarize and explore those communities in a high-dimensional, multi-aspect graph without losing important information? State-of-the-art studies focused on patterns in single graphs, identifying structures in a single snapshot of a large network or in time evolving graphs and stitch them over time. However, to the best of our knowledge, there is no method that discovers and summarizes community structure from a multi-aspect graph, by jointly leveraging information from all aspects. State-of-the-art in multi-aspect/tensor community extraction is limited to discovering clique structure in the extracted communities, or even worse, imposing clique structure where it does not exist. In this paper we bridge that gap by empowering tensor-based methods to extract rich community structure from multi-aspect graphs. In particular, we introduce cLL1, a novel constrained Block Term Tensor Decomposition, that is generally capable of extracting higher than rank-1 but still interpretable structure from a multi-aspect dataset. Subsequently, we propose RichCom, a community structure extraction and summarization algorithm that leverages cLL1to identify rich community structure (e.g., cliques, stars, chains, etc) while leveraging higher-order correlations between the different aspects of the graph. Our contributions are four-fold: (a) Novel algorithm: we develop cLL1, an efficient framework to extract rich and interpretable structure from general multi-aspect data; (b) Graph summarization and exploration: we provide RichCom, a summarization and encoding scheme to discover and explore structures of communities identified by cLL1; (c) Multi-aspect graph generator: we provide a simple and effective synthetic multi-aspect graph generator, and (d) Real-world utility: we present empirical results on small and large real datasets that demonstrate performance on par or superior to existing state-of-the-art.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87473033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Characterizing Search-Engine Traffic to Internet Research Agency Web Properties 表征搜索引擎流量的互联网研究机构网络属性

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380290

Alexander Spangher, G. Ranade, Besmira Nushi, Adam Fourney, E. Horvitz

引用次数: 4

PARS: Peers-aware Recommender System PARS:同伴感知推荐系统

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380013

Huiqiang Mao, Yanzhi Li, Chenliang Li, Di Chen, Xiaoqing Wang, Yuming Deng

引用次数: 1

Active Domain Transfer on Network Embedding 网络嵌入中的主动域转移

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380024

Lichen Jin, Yizhou Zhang, Guojie Song, Yilun Jin

{"title":"Active Domain Transfer on Network Embedding","authors":"Lichen Jin, Yizhou Zhang, Guojie Song, Yilun Jin","doi":"10.1145/3366423.3380024","DOIUrl":"https://doi.org/10.1145/3366423.3380024","url":null,"abstract":"Recent works show that end-to-end, (semi-) supervised network embedding models can generate satisfactory vectors to represent network topology, and are even applicable to unseen graphs by inductive learning. However, domain mismatch between training and testing network for inductive learning, as well as lack of labeled data often compromises the outcome of such methods. To make matters worse, while transfer learning and active learning techniques, being able to solve such problems correspondingly, have been well studied on regular i.i.d data, relatively few attention has been paid on networks. Consequently, we propose in this paper a method for active transfer learning on networks named active-transfer network embedding, abbreviated ATNE. In ATNE we jointly consider the influence of each node on the network from the perspectives of transfer and active learning, and hence design novel and effective influence scores combining both aspects in the training process to facilitate node selection. We demonstrate that ATNE is efficient and decoupled from the actual model used. Further extensive experiments show that ATNE outperforms state-of-the-art active node selection methods and shows versatility in different situations.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89768377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Real-Time Clustering for Large Sparse Online Visitor Data 大型稀疏在线访问者数据的实时聚类

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380183

G. Chan, F. Du, Ryan A. Rossi, Anup B. Rao, Eunyee Koh, Cláudio T. Silva, J. Freire

{"title":"Real-Time Clustering for Large Sparse Online Visitor Data","authors":"G. Chan, F. Du, Ryan A. Rossi, Anup B. Rao, Eunyee Koh, Cláudio T. Silva, J. Freire","doi":"10.1145/3366423.3380183","DOIUrl":"https://doi.org/10.1145/3366423.3380183","url":null,"abstract":"Online visitor behaviors are often modeled as a large sparse matrix, where rows represent visitors and columns represent behavior. To discover customer segments with different hierarchies, marketers often need to cluster the data in different splits. Such analyses require the clustering algorithm to provide real-time responses on user parameter changes, which the current techniques cannot support. In this paper, we propose a real-time clustering algorithm, sparse density peaks, for large-scale sparse data. It pre-processes the input points to compute annotations and a hierarchy for cluster assignment. While the assignment is only a single scan of the points, a naive pre-processing requires measuring all pairwise distances, which incur a quadratic computation overhead and is infeasible for any moderately sized data. Thus, we propose a new approach based on MinHash and LSH that provides fast and accurate estimations. We also describe an efficient implementation on Spark that addresses data skew and memory usage. Our experiments show that our approach (1) provides a better approximation compared to a straightforward MinHash and LSH implementation in terms of accuracy on real datasets, (2) achieves a 20 × speedup in the end-to-end clustering pipeline, and (3) can maintain computations with a small memory. Finally, we present an interface to explore customer segments from millions of online visitor records in real-time.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75558719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Crowd Teaching with Imperfect Labels 用不完美的标签进行教学

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380099

Yao Zhou, A. R. Nelakurthi, Ross Maciejewski, Wei Fan, Jingrui He

{"title":"Crowd Teaching with Imperfect Labels","authors":"Yao Zhou, A. R. Nelakurthi, Ross Maciejewski, Wei Fan, Jingrui He","doi":"10.1145/3366423.3380099","DOIUrl":"https://doi.org/10.1145/3366423.3380099","url":null,"abstract":"The need for annotated labels to train machine learning models led to a surge in crowdsourcing - collecting labels from non-experts. Instead of annotating from scratch, given an imperfect labeled set, how can we leverage the label information obtained from amateur crowd workers to improve the data quality? Furthermore, is there a way to teach the amateur crowd workers using this imperfect labeled set in order to improve their labeling performance? In this paper, we aim to answer both questions via a novel interactive teaching framework, which uses visual explanations to simultaneously teach and gauge the confidence level of the crowd workers. Motivated by the huge demand for fine-grained label information in real-world applications, we start from the realistic and yet challenging assumption that neither the teacher nor the crowd workers are perfect. Then, we propose an adaptive scheme that could improve both of them through a sequence of interactions: the teacher teaches the workers using labeled data, and in return, the workers provide labels and the associated confidence level based on their own expertise. In particular, the teacher performs teaching using an empirical risk minimizer learned from an imperfect labeled set; the workers are assumed to have a forgetting behavior during learning and their learning rate depends on the interpretation difficulty of the teaching item. Furthermore, depending on the level of confidence when the workers perform labeling, we also show that the empirical risk minimizer used by the teacher is a reliable and realistic substitute of the unknown target concept by utilizing the unbiased surrogate loss. Finally, the performance of the proposed framework is demonstrated through experiments on multiple real-world image and text data sets.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"129 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73401030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Directional and Explainable Serendipity Recommendation 定向和可解释的意外建议

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380100

Xueqi Li, Wenjun Jiang, Weiguang Chen, Jie Wu, Guojun Wang, Kenli Li

{"title":"Directional and Explainable Serendipity Recommendation","authors":"Xueqi Li, Wenjun Jiang, Weiguang Chen, Jie Wu, Guojun Wang, Kenli Li","doi":"10.1145/3366423.3380100","DOIUrl":"https://doi.org/10.1145/3366423.3380100","url":null,"abstract":"Serendipity recommendation has attracted more and more attention in recent years; it is committed to providing recommendations which could not only cater to users’ demands but also broaden their horizons. However, existing approaches usually measure user-item relevance with a scalar instead of a vector, ignoring user preference direction, which increases the risk of unrelated recommendations. In addition, reasonable explanations increase users’ trust and acceptance, but there is no work to provide explanations for serendipitous recommendations. To address these limitations, we propose a Directional and Explainable Serendipity Recommendation method named DESR. Specifically, we extract users’ long-term preferences with an unsupervised method based on GMM (Gaussian Mixture Model) and capture their short-term demands with the capsule network at first. Then, we propose the serendipity vector to combine long-term preferences with short-term demands and generate directionally serendipitous recommendations with it. Finally, a back-routing scheme is exploited to offer explanations. Extensive experiments on real-world datasets show that DESR could effectively improve the serendipity and explainability, and give impetus to the diversity, compared with existing serendipity-based methods.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"53 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84583696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Valve: Securing Function Workflows on Serverless Computing Platforms 阀门:在无服务器计算平台上保护功能工作流

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380173

P. Datta, P. Kumar, Tristan Morris, M. Grace, Amir Rahmati, Adam Bates

{"title":"Valve: Securing Function Workflows on Serverless Computing Platforms","authors":"P. Datta, P. Kumar, Tristan Morris, M. Grace, Amir Rahmati, Adam Bates","doi":"10.1145/3366423.3380173","DOIUrl":"https://doi.org/10.1145/3366423.3380173","url":null,"abstract":"Serverless Computing has quickly emerged as a dominant cloud computing paradigm, allowing developers to rapidly prototype event-driven applications using a composition of small functions that each perform a single logical task. However, many such application workflows are based in part on publicly-available functions developed by third-parties, creating the potential for functions to behave in unexpected, or even malicious, ways. At present, developers are not in total control of where and how their data is flowing, creating significant security and privacy risks in growth markets that have embraced serverless (e.g., IoT). As a practical means of addressing this problem, we present Valve, a serverless platform that enables developers to exert complete fine-grained control of information flows in their applications. Valve enables workflow developers to reason about function behaviors, and specify restrictions, through auditing of network-layer information flows. By proxying network requests and propagating taint labels across network flows, Valve is able to restrict function behavior without code modification. We demonstrate that Valve is able defend against known serverless attack behaviors including container reuse-based persistence and data exfiltration over cloud platform APIs with less than 2.8% runtime overhead, 6.25% deployment overhead and 2.35% teardown overhead.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83700139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 37

Collective Multi-type Entity Alignment Between Knowledge Graphs 知识图谱之间的集体多类型实体对齐

Proceedings of The Web Conference 2020 Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380289

Qi Zhu, Hao Wei, Bunyamin Sisman, Da Zheng, C. Faloutsos, Xin Dong, Jiawei Han

{"title":"Collective Multi-type Entity Alignment Between Knowledge Graphs","authors":"Qi Zhu, Hao Wei, Bunyamin Sisman, Da Zheng, C. Faloutsos, Xin Dong, Jiawei Han","doi":"10.1145/3366423.3380289","DOIUrl":"https://doi.org/10.1145/3366423.3380289","url":null,"abstract":"Knowledge graph (e.g. Freebase, YAGO) is a multi-relational graph representing rich factual information among entities of various types. Entity alignment is the key step towards knowledge graph integration from multiple sources. It aims to identify entities across different knowledge graphs that refer to the same real world entity. However, current entity alignment systems overlook the sparsity of different knowledge graphs and can not align multi-type entities by one single model. In this paper, we present a Collective Graph neural network for Multi-type entity Alignment, called CG-MuAlign. Different from previous work, CG-MuAlign jointly aligns multiple types of entities, collectively leverages the neighborhood information and generalizes to unlabeled entity types. Specifically, we propose novel collective aggregation function tailored for this task, that (1) relieves the incompleteness of knowledge graphs via both cross-graph and self attentions, (2) scales up efficiently with mini-batch training paradigm and effective neighborhood sampling strategy. We conduct experiments on real world knowledge graphs with millions of entities and observe the superior performance beyond existing methods. In addition, the running time of our approach is much less than the current state-of-the-art deep learning methods.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83130864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 38