Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval最新文献_第8页

On theme location discovery for travelogue services 关于旅游日志服务的主题地点发现

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2009980

Mao Ye, Rong Xiao, Wang-Chien Lee, Xing Xie

{"title":"On theme location discovery for travelogue services","authors":"Mao Ye, Rong Xiao, Wang-Chien Lee, Xing Xie","doi":"10.1145/2009916.2009980","DOIUrl":"https://doi.org/10.1145/2009916.2009980","url":null,"abstract":"In this paper, we aim to develop a travelogue service that discovers and conveys various travelogue digests, in form of theme locations, geographical scope, traveling trajectory and location snippet, to users. In this service, theme locations in a travelogue are the core information to discover. Thus we aim to address the problem of theme location discovery to enable the above travelogue services. Due to the inherent ambiguity of location relevance, we perform location relevance mining (LRM) in two complementary angles, relevance classification and relevance ranking, to provide comprehensive understanding of locations. Furthermore, we explore the textual (e.g., surrounding words) and geographical (e.g., geographical relationship among locations) features of locations to develop a co-training model for enhancement of classification performance. Built upon the mining result of LRM, we develop a series of techniques for provisioning of the aforementioned travelogue digests in our travelogue system. Finally, we conduct comprehensive experiments on collected travelogues to evaluate the performance of our location relevance mining techniques and demonstrate the effectiveness of the travelogue service.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126459149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Mining tags using social endorsement networks 使用社会认可网络挖掘标签

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2009946

Theodoros Lappas, Kunal Punera, Tamás Sarlós

{"title":"Mining tags using social endorsement networks","authors":"Theodoros Lappas, Kunal Punera, Tamás Sarlós","doi":"10.1145/2009916.2009946","DOIUrl":"https://doi.org/10.1145/2009916.2009946","url":null,"abstract":"Entities on social systems, such as users on Twitter, and images on Flickr, are at the core of many interesting applications: they can be ranked in search results, recommended to users, or used in contextual advertising. Such applications assume knowledge of an entity's nature and characteristic attributes. An effective way to encode such knowledge is in the form of tags. An untagged entity is practically inaccessible, since it is hard to retrieve or interact with. To address this, some platforms allow users to manually tag entities. However,while such tags can be informative, they can oftentimes be inadequate, trivial, ambiguous, or even plain false. Numerous automated tagging methods have been proposed to address these issues. However,most of them require pre-existing high-quality tags or descriptive texts for every entity that needs to be tagged. In our work, we propose a method based on social endorsements that is free from such constraints. Virtually every major social networking platform allows users to endorse entities that they find appealing. Examples include \"following\" Twitter users or \"favoriting\" Flickr photos. These endorsements are abundant and directly capture the preferences of users. In this paper, we pose and solve the problem of using the underlying social endorsement network to extract useful tags for entities in a social system. Our work leverages techniques from topic modeling to capture the interests of users and then uses them to extract relevant and descriptive tags for the entities they endorse. We perform an extensive evaluation of our proposed approach on real large-scale datasets from both Twitter and Flickr, and show that it significantly outperforms meaningful and competitive baselines.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124136534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Effect of different docid orderings on dynamic pruning retrieval strategies 不同文献排序对动态剪枝检索策略的影响

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010108

N. Tonellotto, C. Macdonald, I. Ounis

引用次数: 13

Image annotation based on recommendation model 基于推荐模型的图像标注

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010067

Zijia Lin, Guiguang Ding, Jianmin Wang

引用次数: 9

Disambiguating biomedical acronyms using EMIM 使用EMIM消除生物医学缩略语的歧义

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010125

Nut Limsopatham, Rodrygo L. T. Santos, C. Macdonald, I. Ounis

引用次数: 11

Quantifying test collection quality based on the consistency of relevance judgements 基于相关判断的一致性来量化测试集合的质量

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010057

Falk Scholer, A. Turpin, M. Sanderson

{"title":"Quantifying test collection quality based on the consistency of relevance judgements","authors":"Falk Scholer, A. Turpin, M. Sanderson","doi":"10.1145/2009916.2010057","DOIUrl":"https://doi.org/10.1145/2009916.2010057","url":null,"abstract":"Relevance assessments are a key component for test collection-based evaluation of information retrieval systems. This paper reports on a feature of such collections that is used as a form of ground truth data to allow analysis of human assessment error. A wide range of test collections are retrospectively examined to determine how accurately assessors judge the relevance of documents. Our results demonstrate a high level of inconsistency across the collections studied. The level of irregularity is shown to vary across topics, with some showing a very high level of assessment error. We investigate possible influences on the error, and demonstrate that inconsistency in judging increases with time. While the level of detail in a topic specification does not appear to influence the errors that assessors make, judgements are significantly affected by the decisions made on previously seen similar documents. Assessors also display an assessment inertia. Alternate approaches to generating relevance judgements appear to reduce errors. A further investigation of the way that retrieval systems are ranked using sets of relevance judgements produced early and late in the judgement process reveals a consistent influence measured across the majority of examined test collections. We conclude that there is a clear value in examining, even inserting, ground truth data in test collections, and propose ways to help minimise the sources of inconsistency when creating future test collections.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132801523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 81

Handling data sparsity in collaborative filtering using emotion and semantic based features 使用基于情感和语义的特征处理协同过滤中的数据稀疏性

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010001

Yashar Moshfeghi, Benjamin Piwowarski, J. Jose

引用次数: 109

A tool for comparative IR evaluation on component level 一个组件级比较IR评价工具

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010165

Thomas Wilhelm, Jens Kürsten, Maximilian Eibl

{"title":"A tool for comparative IR evaluation on component level","authors":"Thomas Wilhelm, Jens Kürsten, Maximilian Eibl","doi":"10.1145/2009916.2010165","DOIUrl":"https://doi.org/10.1145/2009916.2010165","url":null,"abstract":"1. MOTIVATION Experimental information retrieval (IR) evaluation is an important instrument to measure the effectiveness of novel methods. Although IR system complexity has grown over years, the general framework for evaluation remained unchanged since its first implementation in the 1960s. Test collections were growing from thousands to millions of documents. Regular reuse resulted in larger topic sets for evaluation. New business models for information access required novel interpretations of effectiveness measures. Nevertheless, most experimental evaluations still rely on an over 50 year old paradigm. Participants of a SIGIR workshop in 2009 [1] discussed the implementation of new methodological standards for evaluation. But at the same time they worried about practicable ways to implement them. A review about recent publications containing experimental evaluations supports this concern [2]. The study also presented a web-based platform for longitudinal evaluation. In a similar way, data from the past decade of CLEF evaluations have been released through the DIRECT system. While the operators of the latter system reported about 50 new users since the release of the data [3], no further contributions were recorded on the web-platform introduced in [2]. In our point of view archiving evaluation data for longitudinal analysis is a first important step. A next step is to develop a methodology that supports researchers in choosing appropriate baselines for comparison. This can be achieved by reporting evaluation results on component level [4] rather than on system level. An exemplary study was presented in [2], where the Indri system was tested with several components switched on or off. Following this idea, an approach to assess novel methods could be to compare to related components only. This would require the community to formally record details of system configurations in connection with experimental results. We suppose that transparent descriptions of system components used in experiments could help researchers in choosing appropriate baselines.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114547451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Bagging gradient-boosted trees for high precision, low variance ranking models 套袋梯度增强树用于高精度、低方差排序模型

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2009932

Y. Ganjisaffar, R. Caruana, C. Lopes

{"title":"Bagging gradient-boosted trees for high precision, low variance ranking models","authors":"Y. Ganjisaffar, R. Caruana, C. Lopes","doi":"10.1145/2009916.2009932","DOIUrl":"https://doi.org/10.1145/2009916.2009932","url":null,"abstract":"Recent studies have shown that boosting provides excellent predictive performance across a wide variety of tasks. In Learning-to-rank, boosted models such as RankBoost and LambdaMART have been shown to be among the best performing learning methods based on evaluations on public data sets. In this paper, we show how the combination of bagging as a variance reduction technique and boosting as a bias reduction technique can result in very high precision and low variance ranking models. We perform thousands of parameter tuning experiments for LambdaMART to achieve a high precision boosting model. Then we show that a bagged ensemble of such LambdaMART boosted models results in higher accuracy ranking models while also reducing variance as much as 50%. We report our results on three public learning-to-rank data sets using four metrics. Bagged LamdbaMART outperforms all previously reported results on ten of the twelve comparisons, and bagged LambdaMART outperforms non-bagged LambdaMART on all twelve comparisons. For example, wrapping bagging around LambdaMART increases NDCG@1 from 0.4137 to 0.4200 on the MQ2007 data set; the best prior results in the literature for this data set is 0.4134 by RankBoost.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116768591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 197

Formulating effective questions for community-based question answering 为社区问答制定有效的问题

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010149

Saori Suzuki, Shin-ichi Nakayama, Hideo Joho

引用次数: 10