Proceedings of the Tenth ACM International Conference on Web Search and Data Mining最新文献_第4页

Neural Models for Full Text Search 全文检索的神经模型

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3042065

Nick Craswell

引用次数: 4

Representation Learning with Pair-wise Constraints for Collaborative Ranking 基于成对约束的表示学习协同排序

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018720

Fuzhen Zhuang, Dan Luo, Nicholas Jing Yuan, Xing Xie, Qing He

{"title":"Representation Learning with Pair-wise Constraints for Collaborative Ranking","authors":"Fuzhen Zhuang, Dan Luo, Nicholas Jing Yuan, Xing Xie, Qing He","doi":"10.1145/3018661.3018720","DOIUrl":"https://doi.org/10.1145/3018661.3018720","url":null,"abstract":"Last decades have witnessed a vast amount of interest and research in recommendation systems. Collaborative filtering, which uses the known preferences of a group of users to make recommendations or predictions of the unknown preferences for other users, is one of the most successful approaches to build recommendation systems. Most previous collaborative filtering approaches employ the matrix factorization techniques to learn latent user feature profiles and item feature profiles. Also many subsequent works are proposed to incorporate users' social network information and items' attributions to further improve recommendation performance under the matrix factorization framework. However, the matrix factorization based methods may not make full use of the rating information, leading to unsatisfying performance. Recently deep learning has been approved to be able to find good representations in natural language processing, image classification, and so on. Along this line, we propose a collaborative ranking framework via representation learning with pair-wise constraints (REAP for short), in which autoencoder is used to simultaneously learn the latent factors of both users and items and pair-wise ranked loss defined by (user, item) pairs is considered. Extensive experiments are conducted on five data sets to demonstrate the effectiveness of the proposed framework.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122058009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Learning from User Interactions in Personal Search via Attribute Parameterization 通过属性参数化学习个人搜索中的用户交互

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018712

Michael Bendersky, Xuanhui Wang, Donald Metzler, Marc Najork

{"title":"Learning from User Interactions in Personal Search via Attribute Parameterization","authors":"Michael Bendersky, Xuanhui Wang, Donald Metzler, Marc Najork","doi":"10.1145/3018661.3018712","DOIUrl":"https://doi.org/10.1145/3018661.3018712","url":null,"abstract":"User interaction data (e.g., click data) has proven to be a powerful signal for learning-to-rank models in web search. However, such models require observing multiple interactions across many users for the same query-document pair to achieve statistically meaningful gains. Therefore, utilizing user interaction data for improving search over personal, rather than public, content is a challenging problem. First, the documents (e.g., emails or private files) are not shared across users. Second, user search queries are of personal nature (e.g., \"alice's address\") and may not generalize well across users. In this paper, we propose a solution to these challenges, by projecting user queries and documents into a multi-dimensional space of fine-grained and semantically coherent attributes. We then introduce a novel parameterization technique to overcome sparsity in the multi-dimensional attribute space. Attribute parameterization enables effective usage of cross-user interactions for improving personal search quality -- which is a first such published result, to the best of our knowledge. Experiments with a dataset derived from interactions of users of one of the world's largest personal search engines demonstrate the effectiveness of the proposed attribute parameterization technique.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122368132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 41

Concept Embedded Convolutional Semantic Model for Question Retrieval 问题检索的概念嵌入卷积语义模型

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018687

P. Wang, Yong Zhang, Lei Ji, Jun Yan, Lianwen Jin

{"title":"Concept Embedded Convolutional Semantic Model for Question Retrieval","authors":"P. Wang, Yong Zhang, Lei Ji, Jun Yan, Lianwen Jin","doi":"10.1145/3018661.3018687","DOIUrl":"https://doi.org/10.1145/3018661.3018687","url":null,"abstract":"The question retrieval, which aims to find similar questions of a given question, is playing pivotal role in various question answering (QA) systems. This task is quite challenging mainly on three aspects: lexical gap, polysemy and word order. In this paper, we propose a unified framework to simultaneously handle these three problems. We use word combined with corresponding concept information to handle the polysemous problem. The concept embedding and word embedding are learned at the same time from both context-dependent and context-independent view. The lexical gap problem is handled since the semantic information has been encoded into the embedding. Then, we propose to use a high-level feature embedded convolutional semantic model to learn the question embedding by inputting the concept embedding and word embedding without manually labeling training data. The proposed framework nicely represent the hierarchical structures of word information and concept information in sentences with their layer-by-layer composition and pooling. Finally, the framework is trained in a weakly-supervised manner on question answer pairs, which can be directly obtained without manually labeling. Experiments on two real question answering datasets show that the proposed framework can significantly outperform the state-of-the-art solutions.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117027305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Location Influence in Location-based Social Networks 基于位置的社交网络中的位置影响

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018705

M. Saleem, Rohit Kumar, T. Calders, Xike Xie, T. Pedersen

引用次数: 33

A Concise Integer Linear Programming Formulation for Implicit Search Result Diversification 隐式搜索结果多样化的简明整数线性规划公式

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018710

Haitao Yu, A. Jatowt, Roi Blanco, Hideo Joho, J. Jose, Long Chen, Fajie Yuan

{"title":"A Concise Integer Linear Programming Formulation for Implicit Search Result Diversification","authors":"Haitao Yu, A. Jatowt, Roi Blanco, Hideo Joho, J. Jose, Long Chen, Fajie Yuan","doi":"10.1145/3018661.3018710","DOIUrl":"https://doi.org/10.1145/3018661.3018710","url":null,"abstract":"To cope with ambiguous and/or underspecified queries, search result diversification (SRD) is a key technique that has attracted a lot of attention. This paper focuses on implicit SRD, where the possible subtopics underlying a query are unknown beforehand. We formulate implicit SRD as a process of selecting and ranking k exemplar documents that utilizes integer linear programming (ILP). Unlike the common practice of relying on approximate methods, this formulation enables us to obtain the optimal solution of the objective function. Based on four benchmark collections, our extensive empirical experiments reveal that: (1) The factors, such as different initial runs, the number of input documents, query types and the ways of computing document similarity significantly affect the performance of diversification models. Careful examinations of these factors are highly recommended in the development of implicit SRD methods. (2) The proposed method can achieve substantially improved performance over the state-of-the-art unsupervised methods for implicit SRD.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126644105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Generating Illustrative Snippets for Open Data on the Web 为Web上的开放数据生成说明性片段

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018670

Gong Cheng, Cheng Jin, Wentao Ding, Danyun Xu, Yuzhong Qu

引用次数: 12

Machine Learning at Amazon 亚马逊的机器学习

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3022764

R. Herbrich

{"title":"Machine Learning at Amazon","authors":"R. Herbrich","doi":"10.1145/3018661.3022764","DOIUrl":"https://doi.org/10.1145/3018661.3022764","url":null,"abstract":"In this talk I will give an introduction into the field of machine learning and discuss why it is a crucial technology for Amazon. Machine learning is the science of automatically extracting patterns from data in order to make automated predictions of future data. One way to differentiate machine learning tasks is by the following two factors: (1) How much noise is contained in the data? and (2) How far into the future is the prediction task? The former presents a limit to the learnability of task --- regardless which learning algorithm is used --- whereas the latter has a crucial implication on the representation of the predictions: while most tasks in search and advertising typically only forecast minutes into the future, tasks in e-commerce can require predictions up to a year into the future. The further the forecast horizon, the more important it is to take account of uncertainty in both the learning algorithm and the representation of the predictions. I will discuss which learning frameworks are best suited for the various scenarios, that is, short-term predictions with little noise vs. long-term predictions with lots of noise, and present some ideas to combine representation learning with probabilistic methods. In the second half of the talk, I will give an overview of the applications of machine learning at Amazon ranging from demand forecasting, machine translation to automation of computer vision tasks and robotics. I will also discuss the importance of tools for data scientist and share learnings on bringing machine learning algorithms into products.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121576011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Neural Text Embeddings for Information Retrieval 用于信息检索的神经文本嵌入

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3022755

Bhaskar Mitra, Nick Craswell

{"title":"Neural Text Embeddings for Information Retrieval","authors":"Bhaskar Mitra, Nick Craswell","doi":"10.1145/3018661.3022755","DOIUrl":"https://doi.org/10.1145/3018661.3022755","url":null,"abstract":"In the last few years, neural representation learning approaches have achieved very good performance on many natural language processing tasks, such as language modelling and machine translation. This suggests that neural models will also achieve good performance on information retrieval (IR) tasks, such as relevance ranking, addressing the query-document vocabulary mismatch problem by using a semantic rather than lexical matching. Although initial iterations of neural models do not outperform traditional lexical-matching baselines, the level of interest and effort in this area is increasing, potentially leading to a breakthrough. The popularity of the recent SIGIR 2016 workshop on Neural Information Retrieval provides evidence to the growing interest in neural models for IR. While recent tutorials have covered some aspects of deep learning for retrieval tasks, there is a significant scope for organizing a tutorial that focuses on the fundamentals of representation learning for text retrieval. The goal of this tutorial will be to introduce state-of-the-art neural embedding models and bridge the gap between these neural models with early representation learning approaches in IR (e.g., LSA). We will discuss some of the key challenges and insights in making these models work in practice, and demonstrate one of the toolsets available to researchers interested in this area.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123218046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 61

Click Through Rate Prediction for Local Search Results 本地搜索结果的点击率预测

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018683

Fidel Cacheda, Nicola Barbieri, Roi Blanco

{"title":"Click Through Rate Prediction for Local Search Results","authors":"Fidel Cacheda, Nicola Barbieri, Roi Blanco","doi":"10.1145/3018661.3018683","DOIUrl":"https://doi.org/10.1145/3018661.3018683","url":null,"abstract":"With the ubiquity of internet access and location services provided by smartphone devices, the volume of queries issued by users to find products and services that are located near them is rapidly increasing. Local search engines help users in this task by matching queries with a predefined geographical connotation (\"local queries\") against a database of local business listings. Local search differs from traditional web-search because to correctly capture users' click behavior, the estimation of relevance between query and candidate results must be integrated with geographical signals, such as distance. The intuition is that users prefer businesses that are physically closer to them. However, this notion of closeness is likely to depend upon other factors, like the category of the business, the quality of the service provided, the density of businesses in the area of interest, etc. In this paper we perform an extensive analysis of online users' behavior and investigate the problem of estimating the click-through rate on local search (LCTR) by exploiting the combination of standard retrieval methods with a rich collection of geo and business-dependent features. We validate our approach on a large log collected from a real-world local search service. Our evaluation shows that the non-linear combination of business information, geo-local and textual relevance features leads to a significant improvements over state of the art alternative approaches based on a combination of relevance, distance and business reputation.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124926352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9