Proceedings of the 13th International Conference on Web Search and Data Mining最新文献

Web-scale Knowledge Collection 网络规模的知识收集

Proceedings of the 13th International Conference on Web Search and Data Mining Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371878

Colin Lockard, Prashant Shiralkar, Xin Dong, Hannaneh Hajishirzi

引用次数: 3

A Structural Graph Representation Learning Framework 一个结构图表示学习框架

Proceedings of the 13th International Conference on Web Search and Data Mining Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371843

Ryan A. Rossi, Nesreen K. Ahmed, Eunyee Koh, Sungchul Kim, Anup B. Rao, Yasin Abbasi-Yadkori

{"title":"A Structural Graph Representation Learning Framework","authors":"Ryan A. Rossi, Nesreen K. Ahmed, Eunyee Koh, Sungchul Kim, Anup B. Rao, Yasin Abbasi-Yadkori","doi":"10.1145/3336191.3371843","DOIUrl":"https://doi.org/10.1145/3336191.3371843","url":null,"abstract":"The success of many graph-based machine learning tasks highly depends on an appropriate representation learned from the graph data. Most work has focused on learning node embeddings that preserve proximity as opposed to structural role-based embeddings that preserve the structural similarity among nodes. These methods fail to capture higher-order structural dependencies and connectivity patterns that are crucial for structural role-based applications such as visitor stitching from web logs. In this work, we formulate higher-order network representation learning and describe a general framework called HONE for learning such structural node embeddings from networks via the subgraph patterns (network motifs, graphlet orbits/positions) in a nodes neighborhood. A general diffusion mechanism is introduced in HONE along with a space-efficient approach that avoids explicit construction of the k-step motif-based matrices using a k-step linear operator. Furthermore, HONE is shown to be fast and efficient with a worst-case time complexity that is nearly-linear in the number of edges. The experiments demonstrate the effectiveness of HONE for a number of important tasks including link prediction and visitor stitching from large web log data.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123288447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Why Do People Buy Seemingly Irrelevant Items in Voice Product Search?: On the Relation between Product Relevance and Customer Satisfaction in eCommerce 为什么人们会在语音产品搜索中购买看似无关的商品?电子商务中产品相关性与顾客满意度的关系研究

Proceedings of the 13th International Conference on Web Search and Data Mining Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371780

David Carmel, Elad Haramaty, Arnon Lazerson, L. Lewin-Eytan, Y. Maarek

{"title":"Why Do People Buy Seemingly Irrelevant Items in Voice Product Search?: On the Relation between Product Relevance and Customer Satisfaction in eCommerce","authors":"David Carmel, Elad Haramaty, Arnon Lazerson, L. Lewin-Eytan, Y. Maarek","doi":"10.1145/3336191.3371780","DOIUrl":"https://doi.org/10.1145/3336191.3371780","url":null,"abstract":"One emerging benefit of voice assistants is to facilitate product search experience, allowing users to express orally which products they seek, and taking actions on retrieved results such as adding them to their cart or sending the product details to their mobile phone for further examination. Looking at users' behavior in product search, supported by a digital voice assistant, we have observed an interesting phenomenon where users purchase or engage with search results that are objectively judged irrelevant to their queries. In this work, we analyze and characterize this phenomenon. We provide several hypotheses as to the reasons behind it, including users' personalized preferences, the product's popularity, the product's indirect relation with the query, the user's tolerance level, the query intent, and the product price. We address each hypothesis by conducting thorough data analyses and offer some insights with respect to users' purchase and engagement behavior with seemingly irrelevant results. We conclude with a discussion on how this analysis can be used to improve voice product search services.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115515509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Intelligible Machine Learning and Knowledge Discovery Boosted by Visual Means 视觉手段促进的可理解机器学习和知识发现

Proceedings of the 13th International Conference on Web Search and Data Mining Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371872

Boris Kovalerchuk

引用次数: 1

PERQ

Proceedings of the 13th International Conference on Web Search and Data Mining Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371782

Zhiyong Wu, B. Kao, Tien-Hsuan Wu, Pengcheng Yin, Qun Liu

引用次数: 12

DySAT: Deep Neural Representation Learning on Dynamic Graphs via Self-Attention Networks 基于自注意网络的动态图的深度神经表征学习

Proceedings of the 13th International Conference on Web Search and Data Mining Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371845

Aravind Sankar, Yanhong Wu, Liang Gou, Wei Zhang, Hao Yang

{"title":"DySAT: Deep Neural Representation Learning on Dynamic Graphs via Self-Attention Networks","authors":"Aravind Sankar, Yanhong Wu, Liang Gou, Wei Zhang, Hao Yang","doi":"10.1145/3336191.3371845","DOIUrl":"https://doi.org/10.1145/3336191.3371845","url":null,"abstract":"Learning node representations in graphs is important for many applications such as link prediction, node classification, and community detection. Existing graph representation learning methods primarily target static graphs while many real-world graphs evolve over time. Complex time-varying graph structures make it challenging to learn informative node representations over time. We present Dynamic Self-Attention Network (DySAT), a novel neural architecture that learns node representations to capture dynamic graph structural evolution. Specifically, DySAT computes node representations through joint self-attention along the two dimensions of structural neighborhood and temporal dynamics. Compared with state-of-the-art recurrent methods modeling graph evolution, dynamic self-attention is efficient, while achieving consistently superior performance. We conduct link prediction experiments on two graph types: communication networks and bipartite rating networks. Experimental results demonstrate significant performance gains for DySAT over several state-of-the-art graph embedding baselines, in both single and multi-step link prediction tasks. Furthermore, our ablation study validates the effectiveness of jointly modeling structural and temporal self-attention.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116674965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 317

Hybrid Utility Function for Unexpected Recommendations 混合实用功能的意想不到的建议

Proceedings of the 13th International Conference on Web Search and Data Mining Pub Date : 2020-01-20 DOI: 10.1145/3336191.3372183

P. Li

引用次数: 1

Sampling Subgraphs with Guaranteed Treewidth for Accurate and Efficient Graphical Inference 采样子图与保证树宽度准确和有效的图形推理

Proceedings of the 13th International Conference on Web Search and Data Mining Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371815

Jaemin Yoo, U. Kang, Mauro Scanagatta, Giorgio Corani, Marco Zaffalon

{"title":"Sampling Subgraphs with Guaranteed Treewidth for Accurate and Efficient Graphical Inference","authors":"Jaemin Yoo, U. Kang, Mauro Scanagatta, Giorgio Corani, Marco Zaffalon","doi":"10.1145/3336191.3371815","DOIUrl":"https://doi.org/10.1145/3336191.3371815","url":null,"abstract":"How can we run graphical inference on large graphs efficiently and accurately? Many real-world networks are modeled as graphical models, and graphical inference is fundamental to understand the properties of those networks. In this work, we propose a novel approach for fast and accurate inference, which first samples a small subgraph and then runs inference over the subgraph instead of the given graph. This is done by the bounded treewidth (BTW) sampling, our novel algorithm that generates a subgraph with guaranteed bounded treewidth while retaining as many edges as possible. We first analyze the properties of BTW theoretically. Then, we evaluate our approach on node classification and compare it with the baseline which is to run loopy belief propagation (LBP) on the original graph. Our approach can be coupled with various inference algorithms: it shows higher accuracy up to 13.7% with the junction tree algorithm, and allows faster inference up to 23.8 times with LBP. We further compare BTW with previous graph sampling algorithms and show that it gives the best accuracy.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"195 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116467794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Text Recognition Using Anonymous CAPTCHA Answers 使用匿名CAPTCHA答案的文本识别

Proceedings of the 13th International Conference on Web Search and Data Mining Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371795

Alexander Shishkin, Anastasya A. Bezzubtseva, Valentina Fedorova, Alexey Drutsa, Gleb Gusev

{"title":"Text Recognition Using Anonymous CAPTCHA Answers","authors":"Alexander Shishkin, Anastasya A. Bezzubtseva, Valentina Fedorova, Alexey Drutsa, Gleb Gusev","doi":"10.1145/3336191.3371795","DOIUrl":"https://doi.org/10.1145/3336191.3371795","url":null,"abstract":"Internet companies use crowdsourcing to collect large amounts of data needed for creating products based on machine learning techniques. A significant source of such labels for OCR data sets is (re)CAPTCHA, which distinguishes humans from automated bots by asking them to recognize text and, at the same time, receives new labeled data in this way. An important component of such approach to data collection is the reduction of noisy labels produced by bots and non-qualified users. In this paper, we address the problem of labeling text images via CAPTCHA, where user identification is generally impossible. We propose a new algorithm to aggregate multiple guesses collected through CAPTCHA. We employ incremental relabeling to minimize the number of guesses needed for obtaining the recognized text of a good accuracy. The aggregation model and the stopping rule for our incremental relabeling are based on novel machine learning techniques and use meta features of CAPTCHA tasks and accumulated guesses. Our experiments show that our approach can provide a large amount of accurately recognized texts using a minimal number of user guesses. Finally, we report the great improvements of an optical character recognition model after implementing our approach in Yandex.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134379528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning with Small Data 小数据学习

Proceedings of the 13th International Conference on Web Search and Data Mining Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371874

Z. Li, Huaxiu Yao, Fenglong Ma

引用次数: 12