Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval最新文献_第2页

Tackling class imbalance and data scarcity in literature-based gene function annotation 基于文献的基因功能标注中的类失衡和数据稀缺性问题

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010080

Mathieu Blondel, Kazuhiro Seki, K. Uehara

引用次数: 5

Collective topic modeling for heterogeneous networks 异构网络的集体主题建模

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010073

Hongbo Deng, Bo Zhao, Jiawei Han

引用次数: 17

QuickView: advanced search of tweets QuickView: tweets的高级搜索

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010157

Xiaohua Liu, Long Jiang, Furu Wei, M. Zhou

引用次数: 4

From one tree to a forest: a unified solution for structured web data extraction 从一棵树到森林:结构化web数据提取的统一解决方案

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010020

Qiang Hao, Rui Cai, Yanwei Pang, Lei Zhang

{"title":"From one tree to a forest: a unified solution for structured web data extraction","authors":"Qiang Hao, Rui Cai, Yanwei Pang, Lei Zhang","doi":"10.1145/2009916.2010020","DOIUrl":"https://doi.org/10.1145/2009916.2010020","url":null,"abstract":"Structured data, in the form of entities and associated attributes, has been a rich web resource for search engines and knowledge databases. To efficiently extract structured data from enormous websites in various verticals (e.g., books, restaurants), much research effort has been attracted, but most existing approaches either require considerable human effort or rely on strong features that lack of flexibility. We consider an ambitious scenario -- can we build a system that (1) is general enough to handle any vertical without re-implementation and (2) requires only one labeled example site from each vertical for training to automatically deal with other sites in the same vertical? In this paper, we propose a unified solution to demonstrate the feasibility of this scenario. Specifically, we design a set of weak but general features to characterize vertical knowledge (including attribute-specific semantics and inter-attribute layout relationships). Such features can be adopted in various verticals without redesign; meanwhile, they are weak enough to avoid overfitting of the learnt knowledge to seed sites. Given a new unseen site, the learnt knowledge is first applied to identify page-level candidate attribute values, while inevitably involve false positives. To remove noise, site-level information of the new site is then exploited to boost up the true values. The site-level information is derived in an unsupervised manner, without harm to the applicability of the solution. Promising experimental performance on 80 websites in 8 distinct verticals demonstrated the feasibility and flexibility of the proposed solution.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125650878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 92

SEJoin: an optimized algorithm towards efficient approximate string searches SEJoin:一种针对高效近似字符串搜索的优化算法

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010143

Junfeng Zhou, Ziyang Chen, Jingrong Zhang

引用次数: 0

Ad hoc IR: not much room for improvement 特设IR:没有太多的改进空间

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010066

A. Trotman, David Keeler

引用次数: 8

Bag-of-visual-words vs global image descriptors on two-stage multimodal retrieval 两阶段多模态检索的视觉词袋与全局图像描述符

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010144

S. Chatzichristofis, Konstantinos Zagoris, A. Arampatzis

引用次数: 13

Understanding and using contextual information in recommender systems 在推荐系统中理解和使用上下文信息

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010184

Licai Wang

{"title":"Understanding and using contextual information in recommender systems","authors":"Licai Wang","doi":"10.1145/2009916.2010184","DOIUrl":"https://doi.org/10.1145/2009916.2010184","url":null,"abstract":"With the rapid development of information technology, the availability of huge amounts of online information makes retrieval a hard task for the average user. Recommender systems (RS) have been employed across several domains to ease this so-called “information overload” problem since the mid-1990s. Recently, context-aware recommender systems (CARS), aiming to further improve the performance accuracy and user satisfaction by fully utilizing contextual information (such as time, location, mood and company) into RS, has become one of the hottest topics [1]. Although a certain progress has been made, CARS still has to face to many challenges. This thesis investigates some key problems in CARS and then proposes some tested and untested approaches to mine the latent relationship among users, contextual information and items (such as movies, web pages and mobile services). In this thesis, the first task is how to elicit contextual user preferences implicitly. All of the existing CARS are based on the assumption that there are available explicit contextual user ratings (e.g., “Sam×Avatar×Morning×Home3”). However, it is hard to obtain sufficient contextual user preferences in practice. This thesis proposes a MAUT (multi attribute utility theory)-based approach to implicitly elicit contextual user preferences through analyzing contextual user behaviors. It considers every type of context as an attribute of items, elicit every unidimensional contextual user preferences based on a n ew context-based IF-IDF formula, and finally elicit multidimensional contextual user preferences after identifying different weights of different contexts. We design a personalized mobile services-oriented prototype system as a test bed to elicit contextual user preferences as well as generate contextual recommendations. I perform experimental comparison of this approach against the other baseline approaches, attaining significant improvements. Secondly, how to alleviate the sparsity problem in CARS is a key challenge. The data sparsity exists in any traditional RS. While incorporating contextual information, the problem of sparse in CARS becomes even more serious. I propose a HOSVD-based contextual recommendation approach, called TensorCARS [2]. It first constructs an N-order tensor to represent multidimensional contextual user preferences and decomposes it into (N-2) 3-order tensors according to different contexts, then uses the HOSVD technique to predict unknown unidimensional contextual user preferences, and then calculates every contextual influence coefficient that each context factor influences user preferences, and finally constructs a new N-order tensor using weighted linearization method. I perform experimental comparison using the prototype system, showing TensorCARS can help alleviate the sparsity problem and increase the prediction accuracy. Thirdly, I consider mood as an important context and design two mood-based hybrid collaborative filtering approaches. ACM CAM","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133895828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

An event-centric model for multilingual document similarity 多语言文档相似度的以事件为中心的模型

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010043

Jannik Strotgen, Michael Gertz, Conny Junghans

{"title":"An event-centric model for multilingual document similarity","authors":"Jannik Strotgen, Michael Gertz, Conny Junghans","doi":"10.1145/2009916.2010043","DOIUrl":"https://doi.org/10.1145/2009916.2010043","url":null,"abstract":"Document similarity measures play an important role in many document retrieval and exploration tasks. Over the past decades, several models and techniques have been developed to determine a ranked list of documents similar to a given query document. Interestingly, the proposed approaches typically rely on extensions to the vector space model and are rarely suited for multilingual corpora. In this paper, we present a novel document similarity measure that is based on events extracted from documents. An event is solely described by nearby occurrences of temporal and geographic expressions in a document's text. Thus, a document is modeled as a set of events that can be compared and ranked using temporal and geographic hierarchies. A key feature of our model is that it is term- and language-independent as temporal and geographic expressions mentioned in texts are normalized to a standard format. This also allows to determine similar documents across languages, an important feature in the context of document exploration. Our approach proves to be quite effective, including the discovery of new similarities, as our experiments using different (multilingual) corpora demonstrate.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131622149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

A novel hybrid index structure for efficient text retrieval 一种用于高效文本检索的新型混合索引结构

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010106

Andreas Broschart, Ralf Schenkel

引用次数: 1