Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval最新文献_第2页

On the Theory of Weak Supervision for Information Retrieval 论信息检索的弱监督理论

Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval Pub Date : 2018-09-10 DOI: 10.1145/3234944.3234968

Hamed Zamani, W. Bruce Croft

{"title":"On the Theory of Weak Supervision for Information Retrieval","authors":"Hamed Zamani, W. Bruce Croft","doi":"10.1145/3234944.3234968","DOIUrl":"https://doi.org/10.1145/3234944.3234968","url":null,"abstract":"Neural network approaches have recently shown to be effective in several information retrieval (IR) tasks. However, neural approaches often require large volumes of training data to perform effectively, which is not always available. To mitigate the shortage of labeled data, training neural IR models with weak supervision has been recently proposed and received considerable attention in the literature. In weak supervision, an existing model automatically generates labels for a large set of unlabeled data, and a machine learning model is further trained on the generated \"weak\" data. Surprisingly, it has been shown in prior art that the trained neural model can outperform the weak labeler by a significant margin. Although these obtained improvements have been intuitively justified in previous work, the literature still lacks theoretical justification for the observed empirical findings. In this paper, we provide a theoretical insight into weak supervision for information retrieval, focusing on learning to rank. We model the weak supervision signal as a noisy channel that introduces noise to the correct ranking. Based on the risk minimization framework, we prove that given some sufficient constraints on the loss function, weak supervision is equivalent to supervised learning under uniform noise. We also find an upper bound for the empirical risk of weak supervision in case of non-uniform noise. Following the recent work on using multiple weak supervision signals to learn more accurate models, we find an information theoretic lower bound on the number of weak supervision signals required to guarantee an upper bound for the pairwise error probability. We empirically verify a set of presented theoretical findings, using synthetic and real weak supervision data.","PeriodicalId":193631,"journal":{"name":"Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127687460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 37

Multi Page Search with Reinforcement Learning to Rank 多页搜索与强化学习排名

Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval Pub Date : 2018-09-10 DOI: 10.1145/3234944.3234977

Wei Zeng, Jun Xu, Yanyan Lan, J. Guo, Xueqi Cheng

{"title":"Multi Page Search with Reinforcement Learning to Rank","authors":"Wei Zeng, Jun Xu, Yanyan Lan, J. Guo, Xueqi Cheng","doi":"10.1145/3234944.3234977","DOIUrl":"https://doi.org/10.1145/3234944.3234977","url":null,"abstract":"Web search engines are typically designed to involve multiple pages of search results, and the search users engaging in exploratory search with ad hoc queries are likely to access more than one result pages. The ranking of web pages for such queries should consider additional information other than the original query, e.g., the user clicks on previous result pages. Existing methods that utilize this kind of information usually involve relevance feedback, which uses the feedback information to explore the user's intent. However, due to the limitation of the feedback mechanism, it is difficult to apply existing relevance feedback techniques to state-of-the-art learning to rank models. In this paper, we propose a novel learning to rank model for multi page search in which the user's feedback can be naturally utilized for improving the ranking of next result page. The model, referred to as MDP-MPS, formalizes the ranking of documents in multi page search as a Markov decision process (MDP) in which the search engine corresponds to the agent for constructing the document rankings in the result pages, and the user corresponds to the environment for judging the rankings and providing rewards. The policy gradient algorithm of REINFORCE is adopted for learning the model parameters. Experimental results on OHSUMED dataset showed that our approach outperformed the baselines of traditional relevance ranking model of ListNet and relevance feedback method of Rocchio.","PeriodicalId":193631,"journal":{"name":"Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129829353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

User Interactions with Search Systems 用户与搜索系统的交互

Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval Pub Date : 2018-09-10 DOI: 10.1145/3234944.3234981

Ying-Hsang Liu, Chang Liu, R. Bierig

引用次数: 0

StatBM25

Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval Pub Date : 2018-09-10 DOI: 10.1145/3234944.3234975

Xing Tan, Fanghong Jiang, J. Huang

{"title":"StatBM25","authors":"Xing Tan, Fanghong Jiang, J. Huang","doi":"10.1145/3234944.3234975","DOIUrl":"https://doi.org/10.1145/3234944.3234975","url":null,"abstract":"In Information Retrieval and Web Search, BM25 is one of the most influential probabilistic retrieval formulas for document weighting and ranking. BM25 involves three parameters $k_1$, $k_3$ and b, which provide scalar approximation and scaling of important document features such asterm frequency, document frequency, anddocument length. We investigate in this paper aggregative and statistical document features for document ranking. Shortly speaking, a statistically adjusted BM25 is used to score in an aggregative way onvirtual documents, which are generated by randomly combining documents from the original collection. The problem size, in the number of virtual documents to be ranked, is an expansion to the problem size of the original problem. As a result, ranking is actually realized through performing statistical sampling. Rejection Sampling, a simple Monte Carlo sampling method is used at present. This new framework is called StatBM25, in emphasizing first the fact that the original problem domain space is K-expanded (a concept to be further explained in the paper); Further, statistical sampling is employed in the model. Empirical studies are performed on several standard test collections, where StatBM25 demonstrates convincingly high degree of both uniqueness and effectiveness compared to BM25. This means, in our belief, that StatBM25 as a statistically smoothed and normalized variant to BM25, might eventually lead to discoveries of useful new statistic measures for document ranking.","PeriodicalId":193631,"journal":{"name":"Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115010184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Large-scale Machine Learning over Graphs 基于图的大规模机器学习

Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval Pub Date : 2018-09-10 DOI: 10.1145/3234944.3240462

Yiming Yang

{"title":"Large-scale Machine Learning over Graphs","authors":"Yiming Yang","doi":"10.1145/3234944.3240462","DOIUrl":"https://doi.org/10.1145/3234944.3240462","url":null,"abstract":"Graphs provide powerful representations for statistical modeling of interrelated variables (observed or latent) in a broad range of machine learning applications. Examples include learning and inference based on the dependency structures among words, documents, topics, users, items, web sites, and more. How to best leverage such dependency structures from multiple graphs with massive and heterogeneous types of nodes and relations has posed grand challenges to machine learning theory and algorithms. This talk presents our recent work in this direction focusing on three significant tasks, including 1) a novel framework for fusing multiple heterogeneous graphs into a unified product graph to enable semi-supervised multi-relational learning, 2) the first algorithmic solution for imposing analogical structures in graph-based entity/relation embedding, and 3) a new formulation of neural architecture search as a graph topology optimization problem, with simple yet powerful algorithms that automatically discover high-performing convolutional neural architectures on image recognition benchmarks, and reduce the computational cost over state-of-the-art non-differentiable techniques by several orders of magnitude.","PeriodicalId":193631,"journal":{"name":"Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129742796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Attentive Contextual Denoising Autoencoder for Recommendation 细心上下文去噪自动编码器推荐

Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval Pub Date : 2018-09-10 DOI: 10.1145/3234944.3234956

Yogesh Jhamb, Travis Ebesu, Yi Fang

{"title":"Attentive Contextual Denoising Autoencoder for Recommendation","authors":"Yogesh Jhamb, Travis Ebesu, Yi Fang","doi":"10.1145/3234944.3234956","DOIUrl":"https://doi.org/10.1145/3234944.3234956","url":null,"abstract":"Personalized recommendation has become increasingly pervasive nowadays. Users receive recommendations on products, movies, point-of-interests and other online services. Traditional collaborative filtering techniques have demonstrated effectiveness in a wide range of recommendation tasks, but they are unable to capture complex relationships between users and items. There is a surge of interest in applying deep learning to recommender systems due to its nonlinear modeling capacity and recent success in other domains such as computer vision and speech recognition. However, prior work does not incorporate contexual information, which is usually largely available in many recommendation tasks. In this paper, we propose a deep learning based model for contexual recommendation. Specifically, the model consists of a denoising autoencoder neural network architecture augmented with a context-driven attention mechanism, referred to as Attentive Contextual Denoising Autoencoder (ACDA). The attention mechanism is utilized to encode the contextual attributes into the hidden representation of the user's preference, which associates personalized context with each user's preference to provide recommendation targeted to that specific user. Experiments conducted on multiple real-world datasets from Meetup and Movielens on event and movie recommendations demonstrate the effectiveness of the proposed model over the state-of-the-art recommenders.","PeriodicalId":193631,"journal":{"name":"Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121607310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 37

The Broad View of Task Type Using Path Analysis 用路径分析法分析任务类型

Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval Pub Date : 2018-09-10 DOI: 10.1145/3234944.3234951

M. Mitsui, C. Shah

引用次数: 6

Entity Set Expansion from Twitter 实体集扩展从Twitter

Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval Pub Date : 2018-09-10 DOI: 10.1145/3234944.3234966

He Zhao, Chong Feng, Zhunchen Luo, Changhai Tian

{"title":"Entity Set Expansion from Twitter","authors":"He Zhao, Chong Feng, Zhunchen Luo, Changhai Tian","doi":"10.1145/3234944.3234966","DOIUrl":"https://doi.org/10.1145/3234944.3234966","url":null,"abstract":"Online social media yields a large-scale corpora which is fairly informative and sometimes includes many up-to-date entities. The challenging task of expanding entity sets on social media text is to extract more uncommon entities only using several seeds already in hand. In this paper, we present an approach which is able to find novel entities by expanding a small initial seed set on Twitter text. Our method first generates candidate sets on the basis of the semantic similarity feature. Then it jointly utilizes 2 text-based features and other 12 ones which carry social media specific information. With the scores on those features, a ranking model is learned by a supervised algorithm to synthetically score each candidate terms and then the final ranked list is taken as the target expanded set. We do experiments with 24 entity classes on the Twitter corpus and in the expanded sets there come many novel entities which have not been completely detected in previous researches. And the experimental results on the datasets of different years can perfectly consist with the objective law that fresh entities change as time goes on.","PeriodicalId":193631,"journal":{"name":"Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122250747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

A Word is Worth a Thousand Ratings: Augmenting Ratings using Reviews for Collaborative Filtering 一个词胜过一千个评级:使用评论进行协同过滤来增强评级

Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval Pub Date : 2018-09-10 DOI: 10.1145/3234944.3234953

Oren Sar Shalom, Guy Uziel, Alexandros Karatzoglou, Amir Kantor

{"title":"A Word is Worth a Thousand Ratings: Augmenting Ratings using Reviews for Collaborative Filtering","authors":"Oren Sar Shalom, Guy Uziel, Alexandros Karatzoglou, Amir Kantor","doi":"10.1145/3234944.3234953","DOIUrl":"https://doi.org/10.1145/3234944.3234953","url":null,"abstract":"In order to provide personalized recommendations, collaborative filtering algorithms take into account several kinds of feedback from the user. A common kind of feedback, which was largely neglected by the Academic community until recently, is textual reviews that are written by the users. Reviews may reveal a great deal about both the users and the items, and indeed in recent years, several algorithms that make use of textual reviews were proposed. However, it is not entirely clear how this signal should be combined with traditional methods that address other kinds of feedback (such as an explicit numeric rating). In this paper, we introduce a novel algorithm, named Collaborative Filtering using Compatibility Vectors (CFCV), which builds upon recent advances in natural language understanding, and uses a neural network in order to provide a meaningful representation of the reviews. This allows to enhance collaborative filtering (particularly, factor methods ) with this new kind of information, in a way that is both natural and effective. We validate our algorithm by conducting experiments on several benchmark datasets, showing that it outperforms the existing methods. Moreover, underlying our solution there is a general architecture that may be further explored.","PeriodicalId":193631,"journal":{"name":"Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131610624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Measuring the Effectiveness of Selective Search Index Partitions without Supervision 测量无监督的选择性搜索索引分区的有效性

Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval Pub Date : 2018-09-10 DOI: 10.1145/3234944.3234952

Yubin Kim, Jamie Callan

引用次数: 0