Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval最新文献

筛选
英文 中文
Tackling class imbalance and data scarcity in literature-based gene function annotation 基于文献的基因功能标注中的类失衡和数据稀缺性问题
Mathieu Blondel, Kazuhiro Seki, K. Uehara
{"title":"Tackling class imbalance and data scarcity in literature-based gene function annotation","authors":"Mathieu Blondel, Kazuhiro Seki, K. Uehara","doi":"10.1145/2009916.2010080","DOIUrl":"https://doi.org/10.1145/2009916.2010080","url":null,"abstract":"In recent years, a number of machine learning approaches to literature-based gene function annotation have been proposed. However, due to issues such as lack of labeled data, class imbalance and computational cost, they have usually been unable to surpass simpler approaches based on string-matching. In this paper, we propose a principled machine learning approach based on kernel classifiers. We show that kernels can address the task's inherent data scarcity by embedding additional knowledge and we propose a simple yet effective solution to deal with class imbalance. From experiments on the TREC Genomics Track data, our approach achieves better F1-score than two state-of-the-art approaches based on string-matching and cross-species information.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124057787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Collective topic modeling for heterogeneous networks 异构网络的集体主题建模
Hongbo Deng, Bo Zhao, Jiawei Han
{"title":"Collective topic modeling for heterogeneous networks","authors":"Hongbo Deng, Bo Zhao, Jiawei Han","doi":"10.1145/2009916.2010073","DOIUrl":"https://doi.org/10.1145/2009916.2010073","url":null,"abstract":"In this paper, we propose a joint probabilistic topic model for simultaneously modeling the contents of multi-typed objects of a heterogeneous information network. The intuition behind our model is that different objects of the heterogeneous network share a common set of latent topics so as to adjust the multinomial distributions over topics for different objects collectively. Experimental results demonstrate the effectiveness of our approach for the tasks of topic modeling and object clustering.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128059990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
QuickView: advanced search of tweets QuickView: tweets的高级搜索
Xiaohua Liu, Long Jiang, Furu Wei, M. Zhou
{"title":"QuickView: advanced search of tweets","authors":"Xiaohua Liu, Long Jiang, Furu Wei, M. Zhou","doi":"10.1145/2009916.2010157","DOIUrl":"https://doi.org/10.1145/2009916.2010157","url":null,"abstract":"Tweets have become a comprehensive repository for real-time information. However, it is often hard for users to quickly get information they are interested in from tweets, owing to the sheer volume of tweets as well as their noisy and informal nature. We present QuickView, an NLP-based tweet search platform to tackle this issue. Specifically, it exploits a series of natural language processing technologies, such as tweet normalization, named entity recognition, semantic role labeling, sentiment analysis, tweet classification, to extract useful information, i.e., named entities, events, opinions, etc., from a large volume of tweets. Then, non-noisy tweets, together with the mined information, are indexed, on top of which two brand new scenarios are enabled, i.e., categorized browsing and advanced search, allowing users to effectively access either the tweets or fine-grained information they are interested in.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125443803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
From one tree to a forest: a unified solution for structured web data extraction 从一棵树到森林:结构化web数据提取的统一解决方案
Qiang Hao, Rui Cai, Yanwei Pang, Lei Zhang
{"title":"From one tree to a forest: a unified solution for structured web data extraction","authors":"Qiang Hao, Rui Cai, Yanwei Pang, Lei Zhang","doi":"10.1145/2009916.2010020","DOIUrl":"https://doi.org/10.1145/2009916.2010020","url":null,"abstract":"Structured data, in the form of entities and associated attributes, has been a rich web resource for search engines and knowledge databases. To efficiently extract structured data from enormous websites in various verticals (e.g., books, restaurants), much research effort has been attracted, but most existing approaches either require considerable human effort or rely on strong features that lack of flexibility. We consider an ambitious scenario -- can we build a system that (1) is general enough to handle any vertical without re-implementation and (2) requires only one labeled example site from each vertical for training to automatically deal with other sites in the same vertical? In this paper, we propose a unified solution to demonstrate the feasibility of this scenario. Specifically, we design a set of weak but general features to characterize vertical knowledge (including attribute-specific semantics and inter-attribute layout relationships). Such features can be adopted in various verticals without redesign; meanwhile, they are weak enough to avoid overfitting of the learnt knowledge to seed sites. Given a new unseen site, the learnt knowledge is first applied to identify page-level candidate attribute values, while inevitably involve false positives. To remove noise, site-level information of the new site is then exploited to boost up the true values. The site-level information is derived in an unsupervised manner, without harm to the applicability of the solution. Promising experimental performance on 80 websites in 8 distinct verticals demonstrated the feasibility and flexibility of the proposed solution.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125650878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 92
SEJoin: an optimized algorithm towards efficient approximate string searches SEJoin:一种针对高效近似字符串搜索的优化算法
Junfeng Zhou, Ziyang Chen, Jingrong Zhang
{"title":"SEJoin: an optimized algorithm towards efficient approximate string searches","authors":"Junfeng Zhou, Ziyang Chen, Jingrong Zhang","doi":"10.1145/2009916.2010143","DOIUrl":"https://doi.org/10.1145/2009916.2010143","url":null,"abstract":"We investigated the problem of finding from a collection of strings those similar to a given query string based on edit distance, for which the critical operation is merging inverted lists of grams generated from the collection of strings. We present an efficient algorithm to accelerate the merging operation.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132218011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ad hoc IR: not much room for improvement 特设IR:没有太多的改进空间
A. Trotman, David Keeler
{"title":"Ad hoc IR: not much room for improvement","authors":"A. Trotman, David Keeler","doi":"10.1145/2009916.2010066","DOIUrl":"https://doi.org/10.1145/2009916.2010066","url":null,"abstract":"Ranking function performance reached a plateau in 1994. The reason for this is investigated. First the performance of BM25 is measured as the proportion of queries satisfied on the first page of 10 results -- it performs well. The performance is then compared to human performance. They perform comparably. The conclusion is there isn't much room for ranking function improvement.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132475555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Bag-of-visual-words vs global image descriptors on two-stage multimodal retrieval 两阶段多模态检索的视觉词袋与全局图像描述符
S. Chatzichristofis, Konstantinos Zagoris, A. Arampatzis
{"title":"Bag-of-visual-words vs global image descriptors on two-stage multimodal retrieval","authors":"S. Chatzichristofis, Konstantinos Zagoris, A. Arampatzis","doi":"10.1145/2009916.2010144","DOIUrl":"https://doi.org/10.1145/2009916.2010144","url":null,"abstract":"The Bag-Of-Visual-Words (BOVW) paradigm is fast becoming a popular image representation for Content-Based Image Retrieval (CBIR), mainly because of its better retrieval effectiveness over global feature representations on collections with images being near-duplicate to queries. In this experimental study we demonstrate that this advantage of BOVW is diminished when visual diversity is enhanced by using a secondary modality, such as text, to pre-filter images. The TOP-SURF descriptor is evaluated against Compact Composite Descriptors on a two-stage image retrieval setup, which first uses a text modality to rank the collection and then perform CBIR only on the top-K items.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134268325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Understanding and using contextual information in recommender systems 在推荐系统中理解和使用上下文信息
Licai Wang
{"title":"Understanding and using contextual information in recommender systems","authors":"Licai Wang","doi":"10.1145/2009916.2010184","DOIUrl":"https://doi.org/10.1145/2009916.2010184","url":null,"abstract":"With the rapid development of information technology, the availability of huge amounts of online information makes retrieval a hard task for the average user. Recommender systems (RS) have been employed across several domains to ease this so-called “information overload” problem since the mid-1990s. Recently, context-aware recommender systems (CARS), aiming to further improve the performance accuracy and user satisfaction by fully utilizing contextual information (such as time, location, mood and company) into RS, has become one of the hottest topics [1]. Although a certain progress has been made, CARS still has to face to many challenges. This thesis investigates some key problems in CARS and then proposes some tested and untested approaches to mine the latent relationship among users, contextual information and items (such as movies, web pages and mobile services). In this thesis, the first task is how to elicit contextual user preferences implicitly. All of the existing CARS are based on the assumption that there are available explicit contextual user ratings (e.g., “Sam×Avatar×Morning×Home3”). However, it is hard to obtain sufficient contextual user preferences in practice. This thesis proposes a MAUT (multi attribute utility theory)-based approach to implicitly elicit contextual user preferences through analyzing contextual user behaviors. It considers every type of context as an attribute of items, elicit every unidimensional contextual user preferences based on a n ew context-based IF-IDF formula, and finally elicit multidimensional contextual user preferences after identifying different weights of different contexts. We design a personalized mobile services-oriented prototype system as a test bed to elicit contextual user preferences as well as generate contextual recommendations. I perform experimental comparison of this approach against the other baseline approaches, attaining significant improvements. Secondly, how to alleviate the sparsity problem in CARS is a key challenge. The data sparsity exists in any traditional RS. While incorporating contextual information, the problem of sparse in CARS becomes even more serious. I propose a HOSVD-based contextual recommendation approach, called TensorCARS [2]. It first constructs an N-order tensor to represent multidimensional contextual user preferences and decomposes it into (N-2) 3-order tensors according to different contexts, then uses the HOSVD technique to predict unknown unidimensional contextual user preferences, and then calculates every contextual influence coefficient that each context factor influences user preferences, and finally constructs a new N-order tensor using weighted linearization method. I perform experimental comparison using the prototype system, showing TensorCARS can help alleviate the sparsity problem and increase the prediction accuracy. Thirdly, I consider mood as an important context and design two mood-based hybrid collaborative filtering approaches. ACM CAM","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133895828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An event-centric model for multilingual document similarity 多语言文档相似度的以事件为中心的模型
Jannik Strotgen, Michael Gertz, Conny Junghans
{"title":"An event-centric model for multilingual document similarity","authors":"Jannik Strotgen, Michael Gertz, Conny Junghans","doi":"10.1145/2009916.2010043","DOIUrl":"https://doi.org/10.1145/2009916.2010043","url":null,"abstract":"Document similarity measures play an important role in many document retrieval and exploration tasks. Over the past decades, several models and techniques have been developed to determine a ranked list of documents similar to a given query document. Interestingly, the proposed approaches typically rely on extensions to the vector space model and are rarely suited for multilingual corpora. In this paper, we present a novel document similarity measure that is based on events extracted from documents. An event is solely described by nearby occurrences of temporal and geographic expressions in a document's text. Thus, a document is modeled as a set of events that can be compared and ranked using temporal and geographic hierarchies. A key feature of our model is that it is term- and language-independent as temporal and geographic expressions mentioned in texts are normalized to a standard format. This also allows to determine similar documents across languages, an important feature in the context of document exploration. Our approach proves to be quite effective, including the discovery of new similarities, as our experiments using different (multilingual) corpora demonstrate.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131622149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
A novel hybrid index structure for efficient text retrieval 一种用于高效文本检索的新型混合索引结构
Andreas Broschart, Ralf Schenkel
{"title":"A novel hybrid index structure for efficient text retrieval","authors":"Andreas Broschart, Ralf Schenkel","doi":"10.1145/2009916.2010106","DOIUrl":"https://doi.org/10.1145/2009916.2010106","url":null,"abstract":"Query processing with precomputed term pair lists can improve efficiency for some queries, but suffers from the quadratic number of index lists that need to be read. We present a novel hybrid index structure that aims at decreasing the number of index lists retrieved at query processing time, trading off a reduced number of index lists for an increased number of bytes to read. Our experiments demonstrate significant cold-cache performance gains of almost 25% on standard benchmark queries.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130763792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信