Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval最新文献_第7页

Query by document via a decomposition-based two-level retrieval approach 通过基于分解的两级检索方法按文档查询

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2009985

Linkai Weng, Zhiwei Li, Rui Cai, Yaoxue Zhang, Yuezhi Zhou, L. Yang, Lei Zhang

{"title":"Query by document via a decomposition-based two-level retrieval approach","authors":"Linkai Weng, Zhiwei Li, Rui Cai, Yaoxue Zhang, Yuezhi Zhou, L. Yang, Lei Zhang","doi":"10.1145/2009916.2009985","DOIUrl":"https://doi.org/10.1145/2009916.2009985","url":null,"abstract":"Retrieving similar documents from a large-scale text corpus according to a given document is a fundamental technique for many applications. However, most of existing indexing techniques have difficulties to address this problem due to special properties of a document query, e.g. high dimensionality, sparse representation and semantic concern. Towards addressing this problem, we propose a two-level retrieval solution based on a document decomposition idea. A document is decomposed to a compact vector and a few document specific keywords by a dimension reduction approach. The compact vector embodies the major semantics of a document, and the document specific keywords complement the discriminative power lost in dimension reduction process. We adopt locality sensitive hashing (LSH) to index the compact vectors, which guarantees to quickly find a set of related documents according to the vector of a query document. Then we re-rank documents in this set by their document specific keywords. In experiments, we obtained promising results on various datasets in terms of both accuracy and performance. We demonstrated that this solution is able to index large-scale corpus for efficient similarity-based document retrieval.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124178374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

A unified framework for recommendations based on quaternary semantic analysis 基于四元语义分析的推荐统一框架

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010052

Wei Chen, W. Hsu, M. Lee

引用次数: 18

Parallel learning to rank for information retrieval 并行学习排序用于信息检索

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010060

Shuaiqiang Wang, Byron J. Gao, Ke Wang, Hady W. Lauw

引用次数: 9

Graph-cut based tag enrichment 基于图切割的标签富集

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010074

Xueming Qian, Xiansheng Hua

引用次数: 6

Sample selection for dictionary-based corpus compression 基于字典的语料库压缩的样本选择

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010087

C. Hoobin, S. Puglisi, J. Zobel

引用次数: 8

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010123

Suleyman Cetintas, Monica Rogati, Luo Si, Yi Fang

引用次数: 26

Learning search tasks in queries and web pages via graph regularization 通过图形正则化学习查询和网页中的搜索任务

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2009928

Ming Ji, Jun Yan, Siyu Gu, Jiawei Han, Xiaofei He, Wei Vivian Zhang, Zheng Chen

{"title":"Learning search tasks in queries and web pages via graph regularization","authors":"Ming Ji, Jun Yan, Siyu Gu, Jiawei Han, Xiaofei He, Wei Vivian Zhang, Zheng Chen","doi":"10.1145/2009916.2009928","DOIUrl":"https://doi.org/10.1145/2009916.2009928","url":null,"abstract":"As the Internet grows explosively, search engines play a more and more important role for users in effectively accessing online information. Recently, it has been recognized that a query is often triggered by a search task that the user wants to accomplish. Similarly, many web pages are specifically designed to help accomplish a certain task. Therefore, learning hidden tasks behind queries and web pages can help search engines return the most useful web pages to users by task matching. For instance, the search task that triggers query \"thinkpad T410 broken\" is to maintain a computer, and it is desirable for a search engine to return the Lenovo troubleshooting page on the top of the list. However, existing search engine technologies mainly focus on topic detection or relevance ranking, which are not able to predict the task that triggers a query and the task a web page can accomplish. In this paper, we propose to simultaneously classify queries and web pages into the popular search tasks by exploiting their content together with click-through logs. Specifically, we construct a taskoriented heterogeneous graph among queries and web pages. Each pair of objects in the graph are linked together as long as they potentially share similar search tasks. A novel graph-based regularization algorithm is designed for search task prediction by leveraging the graph. Extensive experiments in real search log data demonstrate the effectiveness of our method over state-of-the-art classifiers, and the search performance can be significantly improved by using the task prediction results as additional information.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125350678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Predicting web searcher satisfaction with existing community-based answers 预测网络搜索者对现有社区答案的满意度

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2009974

Qiaoling Liu, Eugene Agichtein, G. Dror, E. Gabrilovich, Y. Maarek, D. Pelleg, Idan Szpektor

{"title":"Predicting web searcher satisfaction with existing community-based answers","authors":"Qiaoling Liu, Eugene Agichtein, G. Dror, E. Gabrilovich, Y. Maarek, D. Pelleg, Idan Szpektor","doi":"10.1145/2009916.2009974","DOIUrl":"https://doi.org/10.1145/2009916.2009974","url":null,"abstract":"Community-based Question Answering (CQA) sites, such as Yahoo! Answers, Baidu Knows, Naver, and Quora, have been rapidly growing in popularity. The resulting archives of posted answers to questions, in Yahoo! Answers alone, already exceed in size 1 billion, and are aggressively indexed by web search engines. In fact, a large number of search engine users benefit from these archives, by finding existing answers that address their own queries. This scenario poses new challenges and opportunities for both search engines and CQA sites. To this end, we formulate a new problem of predicting the satisfaction of web searchers with CQA answers. We analyze a large number of web searches that result in a visit to a popular CQA site, and identify unique characteristics of searcher satisfaction in this setting, namely, the effects of query clarity, query-to-question match, and answer quality. We then propose and evaluate several approaches to predicting searcher satisfaction that exploit these characteristics. To the best of our knowledge, this is the first attempt to predict and validate the usefulness of CQA archives for external searchers, rather than for the original askers. Our results suggest promising directions for improving and exploiting community question answering services in pursuit of satisfying even more Web search queries.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128034239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 90

Recommending ephemeral items at web scale 在网络规模上推荐短暂的项目

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010051

Ye Chen, J. Canny

{"title":"Recommending ephemeral items at web scale","authors":"Ye Chen, J. Canny","doi":"10.1145/2009916.2010051","DOIUrl":"https://doi.org/10.1145/2009916.2010051","url":null,"abstract":"We describe an innovative and scalable recommendation system successfully deployed at eBay. To build recommenders for long-tail marketplaces requires projection of volatile items into a persistent space of latent products. We first present a generative clustering model for collections of unstructured, heterogeneous, and ephemeral item data, under the assumption that items are generated from latent products. An item is represented as a vector of independently and distinctly distributed variables, while a latent product is characterized as a vector of probability distributions, respectively. The probability distributions are chosen as natural stochastic models for different types of data. The learning objective is to maximize the total intra-cluster coherence measured by the sum of log likelihoods of items under such a generative process. In the space of latent products, robust recommendations can then be derived using naive Bayes for ranking, from historical transactional data. Item-based recommendations are achieved by inferring latent products from unseen items. In particular, we develop a probabilistic scoring function of recommended items, which takes into account item-product membership, product purchase probability, and the important auction-end-time factor. With the holistic probabilistic measure of a prospective item purchase, one can further maximize the expected revenue and the more subjective user satisfaction as well. We evaluated the latent product clustering and recommendation ranking models using real-world e-commerce data from eBay, in both forms of offline simulation and online A/B testing. In the recent production launch, our system yielded 3-5 folds improvement over the existing production system in click-through, purchase-through and gross merchandising value; thus now driving 100% related recommendation traffic with billions of items at eBay. We believe that this work provides a practical yet principled framework for recommendation in the domains with affluent user self-input data.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126704301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Energy-price-driven query processing in multi-center web search engines 多中心网络搜索引擎中能源价格驱动的查询处理

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010047

Enver Kayaaslan, B. B. Cambazoglu, Roi Blanco, F. Junqueira, C. Aykanat

引用次数: 43