Proceedings of the 13th International Conference on Web Search and Data Mining最新文献

筛选
英文 中文
Web-scale Knowledge Collection 网络规模的知识收集
Proceedings of the 13th International Conference on Web Search and Data Mining Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371878
Colin Lockard, Prashant Shiralkar, Xin Dong, Hannaneh Hajishirzi
{"title":"Web-scale Knowledge Collection","authors":"Colin Lockard, Prashant Shiralkar, Xin Dong, Hannaneh Hajishirzi","doi":"10.1145/3336191.3371878","DOIUrl":"https://doi.org/10.1145/3336191.3371878","url":null,"abstract":"How do we surface the large amount of information present in HTML documents on the Web, from news articles to scientific papers to Rotten Tomatoes pages to tables of sports scores? Such information can enable a variety of applications including knowledge base construction, question answering, recommendation, and more. In this tutorial, we present approaches for Information Extraction (IE) from Web data that can be differentiated along two key dimensions: 1) the diversity in data modality that is leveraged, e.g. text, visual, XML/HTML, and 2) the thrust to develop scalable approaches with zero to limited human supervision. We cover the key ideas and intuition behind existing approaches to emphasize their applicability and potential in various settings.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127205691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Structural Graph Representation Learning Framework 一个结构图表示学习框架
Proceedings of the 13th International Conference on Web Search and Data Mining Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371843
Ryan A. Rossi, Nesreen K. Ahmed, Eunyee Koh, Sungchul Kim, Anup B. Rao, Yasin Abbasi-Yadkori
{"title":"A Structural Graph Representation Learning Framework","authors":"Ryan A. Rossi, Nesreen K. Ahmed, Eunyee Koh, Sungchul Kim, Anup B. Rao, Yasin Abbasi-Yadkori","doi":"10.1145/3336191.3371843","DOIUrl":"https://doi.org/10.1145/3336191.3371843","url":null,"abstract":"The success of many graph-based machine learning tasks highly depends on an appropriate representation learned from the graph data. Most work has focused on learning node embeddings that preserve proximity as opposed to structural role-based embeddings that preserve the structural similarity among nodes. These methods fail to capture higher-order structural dependencies and connectivity patterns that are crucial for structural role-based applications such as visitor stitching from web logs. In this work, we formulate higher-order network representation learning and describe a general framework called HONE for learning such structural node embeddings from networks via the subgraph patterns (network motifs, graphlet orbits/positions) in a nodes neighborhood. A general diffusion mechanism is introduced in HONE along with a space-efficient approach that avoids explicit construction of the k-step motif-based matrices using a k-step linear operator. Furthermore, HONE is shown to be fast and efficient with a worst-case time complexity that is nearly-linear in the number of edges. The experiments demonstrate the effectiveness of HONE for a number of important tasks including link prediction and visitor stitching from large web log data.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123288447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Why Do People Buy Seemingly Irrelevant Items in Voice Product Search?: On the Relation between Product Relevance and Customer Satisfaction in eCommerce 为什么人们会在语音产品搜索中购买看似无关的商品?电子商务中产品相关性与顾客满意度的关系研究
Proceedings of the 13th International Conference on Web Search and Data Mining Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371780
David Carmel, Elad Haramaty, Arnon Lazerson, L. Lewin-Eytan, Y. Maarek
{"title":"Why Do People Buy Seemingly Irrelevant Items in Voice Product Search?: On the Relation between Product Relevance and Customer Satisfaction in eCommerce","authors":"David Carmel, Elad Haramaty, Arnon Lazerson, L. Lewin-Eytan, Y. Maarek","doi":"10.1145/3336191.3371780","DOIUrl":"https://doi.org/10.1145/3336191.3371780","url":null,"abstract":"One emerging benefit of voice assistants is to facilitate product search experience, allowing users to express orally which products they seek, and taking actions on retrieved results such as adding them to their cart or sending the product details to their mobile phone for further examination. Looking at users' behavior in product search, supported by a digital voice assistant, we have observed an interesting phenomenon where users purchase or engage with search results that are objectively judged irrelevant to their queries. In this work, we analyze and characterize this phenomenon. We provide several hypotheses as to the reasons behind it, including users' personalized preferences, the product's popularity, the product's indirect relation with the query, the user's tolerance level, the query intent, and the product price. We address each hypothesis by conducting thorough data analyses and offer some insights with respect to users' purchase and engagement behavior with seemingly irrelevant results. We conclude with a discussion on how this analysis can be used to improve voice product search services.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115515509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Intelligible Machine Learning and Knowledge Discovery Boosted by Visual Means 视觉手段促进的可理解机器学习和知识发现
Proceedings of the 13th International Conference on Web Search and Data Mining Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371872
Boris Kovalerchuk
{"title":"Intelligible Machine Learning and Knowledge Discovery Boosted by Visual Means","authors":"Boris Kovalerchuk","doi":"10.1145/3336191.3371872","DOIUrl":"https://doi.org/10.1145/3336191.3371872","url":null,"abstract":"Intelligible machine learning and knowledge discovery are important for modeling individual and social behavior, user activity, link prediction, community detection, crowd-generated data, and others. The role of the interpretable method in web search and mining activities is also very significant to enhance clustering, classification, data summarization, knowledge acquisition, opinion and sentiment mining, web traffic analysis, and web recommender systems. Deep learning success in accuracy of prediction and its failure in explanation of the produced models without special interpretation efforts motivated the surge of efforts to make Machine Learning (ML) models more intelligible and understandable. The prominence of visual methods in getting appealing explanations of ML models motivated the growth of deep visualization, and visual knowledge discovery. This tutorial covers the state-of-the-art research, development, and applications in the area of Intelligible Knowledge Discovery, and Machine Learning boosted by Visual Means.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"168 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116447323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
PERQ
Proceedings of the 13th International Conference on Web Search and Data Mining Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371782
Zhiyong Wu, B. Kao, Tien-Hsuan Wu, Pengcheng Yin, Qun Liu
{"title":"PERQ","authors":"Zhiyong Wu, B. Kao, Tien-Hsuan Wu, Pengcheng Yin, Qun Liu","doi":"10.1145/3336191.3371782","DOIUrl":"https://doi.org/10.1145/3336191.3371782","url":null,"abstract":"A knowledge-based question-answering (KB-QA) system is one that answers natural-language questions by accessing information stored in a knowledge base (KB). Existing KB-QA systems generally register an accuracy of 70-80% for simple questions and less for more complex ones. We observe that certain questions are intrinsically difficult to answer correctly with existing systems. We propose the PERQ framework to address this issue. Given a question q, we perform three steps to boost answer accuracy: (1) (Prediction) We predict if q can be answered correctly by a KB-QA system S. (2) (Explanation) If S is predicted to fail q, we analyze them to determine the most likely reasons of the failure. (3) (Rectification) We use the prediction and explanation results to rectify the answer. We put forward tools to achieve the three steps and analyze their effectiveness. Our experiments show that the PERQ framework can significantly improve KB-QA systems' accuracies over simple questions.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122844585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
DySAT: Deep Neural Representation Learning on Dynamic Graphs via Self-Attention Networks 基于自注意网络的动态图的深度神经表征学习
Proceedings of the 13th International Conference on Web Search and Data Mining Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371845
Aravind Sankar, Yanhong Wu, Liang Gou, Wei Zhang, Hao Yang
{"title":"DySAT: Deep Neural Representation Learning on Dynamic Graphs via Self-Attention Networks","authors":"Aravind Sankar, Yanhong Wu, Liang Gou, Wei Zhang, Hao Yang","doi":"10.1145/3336191.3371845","DOIUrl":"https://doi.org/10.1145/3336191.3371845","url":null,"abstract":"Learning node representations in graphs is important for many applications such as link prediction, node classification, and community detection. Existing graph representation learning methods primarily target static graphs while many real-world graphs evolve over time. Complex time-varying graph structures make it challenging to learn informative node representations over time. We present Dynamic Self-Attention Network (DySAT), a novel neural architecture that learns node representations to capture dynamic graph structural evolution. Specifically, DySAT computes node representations through joint self-attention along the two dimensions of structural neighborhood and temporal dynamics. Compared with state-of-the-art recurrent methods modeling graph evolution, dynamic self-attention is efficient, while achieving consistently superior performance. We conduct link prediction experiments on two graph types: communication networks and bipartite rating networks. Experimental results demonstrate significant performance gains for DySAT over several state-of-the-art graph embedding baselines, in both single and multi-step link prediction tasks. Furthermore, our ablation study validates the effectiveness of jointly modeling structural and temporal self-attention.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116674965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 317
Hybrid Utility Function for Unexpected Recommendations 混合实用功能的意想不到的建议
Proceedings of the 13th International Conference on Web Search and Data Mining Pub Date : 2020-01-20 DOI: 10.1145/3336191.3372183
P. Li
{"title":"Hybrid Utility Function for Unexpected Recommendations","authors":"P. Li","doi":"10.1145/3336191.3372183","DOIUrl":"https://doi.org/10.1145/3336191.3372183","url":null,"abstract":"Unexpectedness constitutes an important factor for recommender system to improve user satisfaction and avoid filter bubble issues. In this proposal, we propose to provide unexpected recommendations using the hybrid utility function as a mixture of estimated ratings, unexpectedness, relevance and annoyance. We plan to conduct extensive experiments to validate the superiority of the proposed method.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128268611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Sampling Subgraphs with Guaranteed Treewidth for Accurate and Efficient Graphical Inference 采样子图与保证树宽度准确和有效的图形推理
Proceedings of the 13th International Conference on Web Search and Data Mining Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371815
Jaemin Yoo, U. Kang, Mauro Scanagatta, Giorgio Corani, Marco Zaffalon
{"title":"Sampling Subgraphs with Guaranteed Treewidth for Accurate and Efficient Graphical Inference","authors":"Jaemin Yoo, U. Kang, Mauro Scanagatta, Giorgio Corani, Marco Zaffalon","doi":"10.1145/3336191.3371815","DOIUrl":"https://doi.org/10.1145/3336191.3371815","url":null,"abstract":"How can we run graphical inference on large graphs efficiently and accurately? Many real-world networks are modeled as graphical models, and graphical inference is fundamental to understand the properties of those networks. In this work, we propose a novel approach for fast and accurate inference, which first samples a small subgraph and then runs inference over the subgraph instead of the given graph. This is done by the bounded treewidth (BTW) sampling, our novel algorithm that generates a subgraph with guaranteed bounded treewidth while retaining as many edges as possible. We first analyze the properties of BTW theoretically. Then, we evaluate our approach on node classification and compare it with the baseline which is to run loopy belief propagation (LBP) on the original graph. Our approach can be coupled with various inference algorithms: it shows higher accuracy up to 13.7% with the junction tree algorithm, and allows faster inference up to 23.8 times with LBP. We further compare BTW with previous graph sampling algorithms and show that it gives the best accuracy.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"195 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116467794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Text Recognition Using Anonymous CAPTCHA Answers 使用匿名CAPTCHA答案的文本识别
Proceedings of the 13th International Conference on Web Search and Data Mining Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371795
Alexander Shishkin, Anastasya A. Bezzubtseva, Valentina Fedorova, Alexey Drutsa, Gleb Gusev
{"title":"Text Recognition Using Anonymous CAPTCHA Answers","authors":"Alexander Shishkin, Anastasya A. Bezzubtseva, Valentina Fedorova, Alexey Drutsa, Gleb Gusev","doi":"10.1145/3336191.3371795","DOIUrl":"https://doi.org/10.1145/3336191.3371795","url":null,"abstract":"Internet companies use crowdsourcing to collect large amounts of data needed for creating products based on machine learning techniques. A significant source of such labels for OCR data sets is (re)CAPTCHA, which distinguishes humans from automated bots by asking them to recognize text and, at the same time, receives new labeled data in this way. An important component of such approach to data collection is the reduction of noisy labels produced by bots and non-qualified users. In this paper, we address the problem of labeling text images via CAPTCHA, where user identification is generally impossible. We propose a new algorithm to aggregate multiple guesses collected through CAPTCHA. We employ incremental relabeling to minimize the number of guesses needed for obtaining the recognized text of a good accuracy. The aggregation model and the stopping rule for our incremental relabeling are based on novel machine learning techniques and use meta features of CAPTCHA tasks and accumulated guesses. Our experiments show that our approach can provide a large amount of accurately recognized texts using a minimal number of user guesses. Finally, we report the great improvements of an optical character recognition model after implementing our approach in Yandex.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134379528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning with Small Data 小数据学习
Proceedings of the 13th International Conference on Web Search and Data Mining Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371874
Z. Li, Huaxiu Yao, Fenglong Ma
{"title":"Learning with Small Data","authors":"Z. Li, Huaxiu Yao, Fenglong Ma","doi":"10.1145/3336191.3371874","DOIUrl":"https://doi.org/10.1145/3336191.3371874","url":null,"abstract":"In the era of big data, it is easy for us collect a huge number of image and text data. However, we frequently face the real-world problems with only small (labeled) data in some domains, such as healthcare and urban computing. The challenge is how to make machine learn algorithms still work well with small data? To solve this challenge, in this tutorial, we will cover the state-of-the-art machine learning techniques to handle small data issue. In particular, we focus on the following three aspects: (1) Providing a comprehensive review of recent advances in exploring the power of knowledge transfer, especially focusing on meta-learning; (2) introducing the cutting-edge techniques of incorporating human/expert knowledge into machine learning models; and (3) identifying the open challenges to data augmentation techniques, such as generative adversarial networks. We believe this is an emerging and potentially high-impact topic in computational data science, which will attract both researchers and practitioners from academia and industry.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132445784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信