Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval最新文献_第4页

Efficient manifold ranking for image retrieval 用于图像检索的高效流形排序

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2009988

Bin Xu, Jiajun Bu, Chun Chen, Deng Cai, Xiaofei He, W. Liu, Jiebo Luo

{"title":"Efficient manifold ranking for image retrieval","authors":"Bin Xu, Jiajun Bu, Chun Chen, Deng Cai, Xiaofei He, W. Liu, Jiebo Luo","doi":"10.1145/2009916.2009988","DOIUrl":"https://doi.org/10.1145/2009916.2009988","url":null,"abstract":"Manifold Ranking (MR), a graph-based ranking algorithm, has been widely applied in information retrieval and shown to have excellent performance and feasibility on a variety of data types. Particularly, it has been successfully applied to content-based image retrieval, because of its outstanding ability to discover underlying geometrical structure of the given image database. However, manifold ranking is computationally very expensive, both in graph construction and ranking computation stages, which significantly limits its applicability to very large data sets. In this paper, we extend the original manifold ranking algorithm and propose a new framework named Efficient Manifold Ranking (EMR). We aim to address the shortcomings of MR from two perspectives: scalable graph construction and efficient computation. Specifically, we build an anchor graph on the data set instead of the traditional k-nearest neighbor graph, and design a new form of adjacency matrix utilized to speed up the ranking computation. The experimental results on a real world image database demonstrate the effectiveness and efficiency of our proposed method. With a comparable performance to the original manifold ranking, our method significantly reduces the computational time, makes it a promising method to large scale real world retrieval problems.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130984027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 139

Cognitive coordinating behaviors in multitasking web search 多任务网络搜索中的认知协调行为

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010077

J. Du

引用次数: 1

User behavior in zero-recall ecommerce queries 零召回电子商务查询中的用户行为

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2009930

Gyanit Singh, Nish Parikh, Neel Sundaresan

引用次数: 13

Posting list intersection on multicore architectures 在多核架构上发布列表交集

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010045

S. Tatikonda, B. B. Cambazoglu, F. Junqueira

{"title":"Posting list intersection on multicore architectures","authors":"S. Tatikonda, B. B. Cambazoglu, F. Junqueira","doi":"10.1145/2009916.2010045","DOIUrl":"https://doi.org/10.1145/2009916.2010045","url":null,"abstract":"In current commercial Web search engines, queries are processed in the conjunctive mode, which requires the search engine to compute the intersection of a number of posting lists to determine the documents matching all query terms. In practice, the intersection operation takes a significant fraction of the query processing time, for some queries dominating the total query latency. Hence, efficient posting list intersection is critical for achieving short query latencies. In this work, we focus on improving the performance of posting list intersection by leveraging the compute capabilities of recent multicore systems. To this end, we consider various coarse-grained and fine-grained parallelization models for list intersection. Specifically, we present an algorithm that partitions the work associated with a given query into a number of small and independent tasks that are subsequently processed in parallel. Through a detailed empirical analysis of these alternative models, we demonstrate that exploiting parallelism at the finest-level of granularity is critical to achieve the best performance on multicore systems. On an eight-core system, the fine-grained parallelization method is able to achieve more than five times reduction in average query processing time while still exploiting the parallelism for high query throughput.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"259 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127655166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 46

Temporal latent semantic analysis for collaboratively generated content: preliminary results 协作生成内容的时间潜在语义分析:初步结果

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010091

Yu Wang, Eugene Agichtein

引用次数: 7

Link formation analysis in microblogs 微博中的链接形成分析

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010136

Dawei Yin, Liangjie Hong, Xiong Xiong, Brian D. Davison

引用次数: 44

Forecasting counts of user visits for online display advertising with probabilistic latent class models 基于概率潜在类模型的在线展示广告用户访问量预测

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010127

Suleyman Cetintas, Datong Chen, Luo Si, Bin Shen, Z. Datbayev

引用次数: 6

Find it if you can: a game for modeling different types of web search success using interaction data 如果你能找到它:一款利用互动数据模拟不同类型网页搜索成功的游戏

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2009965

Mikhail S. Ageev, Qi Guo, Dmitry Lagun, Eugene Agichtein

{"title":"Find it if you can: a game for modeling different types of web search success using interaction data","authors":"Mikhail S. Ageev, Qi Guo, Dmitry Lagun, Eugene Agichtein","doi":"10.1145/2009916.2009965","DOIUrl":"https://doi.org/10.1145/2009916.2009965","url":null,"abstract":"A better understanding of strategies and behavior of successful searchers is crucial for improving the experience of all searchers. However, research of search behavior has been struggling with the tension between the relatively small-scale, but controlled lab studies, and the large-scale log-based studies where the searcher intent and many other important factors have to be inferred. We present our solution for performing controlled, yet realistic, scalable, and reproducible studies of searcher behavior. We focus on difficult informational tasks, which tend to frustrate many users of the current web search technology. First, we propose a principled formalization of different types of \"success\" for informational search, which encapsulate and sharpen previously proposed models. Second, we present a scalable game-like infrastructure for crowdsourcing search behavior studies, specifically targeted towards capturing and evaluating successful search strategies on informational tasks with known intent. Third, we report our analysis of search success using these data, which confirm and extends previous findings. Finally, we demonstrate that our model can predict search success more effectively than the existing state-of-the-art methods, on both our data and on a different set of log data collected from regular search engine sessions. Together, our search success models, the data collection infrastructure, and the associated behavior analysis techniques, significantly advance the study of success in web search.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115716871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 135

Evaluating diversified search results using per-intent graded relevance 使用每个意图分级相关性评估多样化的搜索结果

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010055

T. Sakai, Ruihua Song

{"title":"Evaluating diversified search results using per-intent graded relevance","authors":"T. Sakai, Ruihua Song","doi":"10.1145/2009916.2010055","DOIUrl":"https://doi.org/10.1145/2009916.2010055","url":null,"abstract":"Search queries are often ambiguous and/or underspecified. To accomodate different user needs, search result diversification has received attention in the past few years. Accordingly, several new metrics for evaluating diversification have been proposed, but their properties are little understood. We compare the properties of existing metrics given the premises that (1) queries may have multiple intents; (2) the likelihood of each intent given a query is available; and (3) graded relevance assessments are available for each intent. We compare a wide range of traditional and diversified IR metrics after adding graded relevance assessments to the TREC 2009 Web track diversity task test collection which originally had binary relevance assessments. Our primary criterion is discriminative power, which represents the reliability of a metric in an experiment. Our results show that diversified IR experiments with a given number of topics can be as reliable as traditional IR experiments with the same number of topics, provided that the right metrics are used. Moreover, we compare the intuitiveness of diversified IR metrics by closely examining the actual ranked lists from TREC. We show that a family of metrics called D#-measures have several advantages over other metrics such as α-nDCG and Intent-Aware metrics.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114359985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 163

A novel corpus-based stemming algorithm using co-occurrence statistics 基于共现统计的语料库词干提取算法

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010031

Jiaul H. Paik, Dipasree Pal, S. Parui

{"title":"A novel corpus-based stemming algorithm using co-occurrence statistics","authors":"Jiaul H. Paik, Dipasree Pal, S. Parui","doi":"10.1145/2009916.2010031","DOIUrl":"https://doi.org/10.1145/2009916.2010031","url":null,"abstract":"We present a stemming algorithm for text retrieval. The algorithm uses the statistics collected on the basis of certain corpus analysis based on the co-occurrence between two word variants. We use a very simple co-occurrence measure that reflects how often a pair of word variants occurs in a document as well as in the whole corpus. A graph is formed where the word variants are the nodes and two word variants form an edge if they co-occur. On the basis of the co-occurrence measure, a certain edge strength is defined for each of the edges. Finally, on the basis of the edge strengths, we propose a partition algorithm that groups the word variants based on their strongest neighbors, that is, the neighbors with largest strengths. Our stemming algorithm has two static parameters and does not use any other information except the co-occurrence statistics from the corpus. The experiments on TREC, CLEF and FIRE data consisting of four European and two Asian languages show a significant improvement over no-stem strategy on all the languages. Also, the proposed algorithm significantly outperforms a number of strong stemmers including the rule-based ones on a number of languages. For highly inflectional languages, a relative improvement of about 50% is obtained compared to un-normalized words and a relative improvement ranging from 5% to 16% is obtained compared to the rule based stemmer for the concerned language.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114553503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43