Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval最新文献_第10页

Detecting outlier sections in us congressional legislation 检测美国国会立法中的异常部分

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2009951

Elif Aktolga, Irene Ros, Yannick Assogba

{"title":"Detecting outlier sections in us congressional legislation","authors":"Elif Aktolga, Irene Ros, Yannick Assogba","doi":"10.1145/2009916.2009951","DOIUrl":"https://doi.org/10.1145/2009916.2009951","url":null,"abstract":"Reading congressional legislation, also known as bills, is often tedious because bills tend to be long and written in complex language. In IBM Many Bills, an interactive web-based visualization of legislation, users of different backgrounds can browse bills and quickly explore parts that are of interest to them. One task users have is to be able to locate sections that don't seem to fit with the overall topic of the bill. In this paper, we present novel techniques to determine which sections within a bill are likely to be outliers by employing approaches from information retrieval. The most promising techniques first detect the most topically relevant parts of a bill by ranking its sections, followed by a comparison between these topically relevant parts and the remaining sections in the bill. To compare sections we use various dissimilarity metrics based on Kullback-Leibler Divergence. The results indicate that these techniques are more successful than a classification based approach. Finally, we analyze how the dissimilarity metrics succeed in discriminating between sections that are strong outliers versus those that are 'milder' outliers.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129936952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Probabilistic factor models for web site recommendation 网站推荐的概率因子模型

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2009955

Hao Ma, Chao Liu, Irwin King, Michael R. Lyu

引用次数: 80

System effectiveness, user models, and user utility: a conceptual framework for investigation 系统有效性、用户模型和用户效用:研究的概念框架

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010037

Ben Carterette

引用次数: 146

Learning for graphs with annotated edges 带注释边的图的学习

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010148

Fan Li

引用次数: 0

Enhanced results for web search 增强的网络搜索结果

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010014

Kevin Haas, P. Mika, P. Tarjan, Roi Blanco

引用次数: 48

No Free Lunch: Brute Force vs. Locality-Sensitive Hashing for Cross-lingual Pairwise Similarity 没有免费的午餐:蛮力vs.地域敏感哈希跨语言两两相似性

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010042

Ferhan Ture, T. Elsayed, Jimmy J. Lin

{"title":"No Free Lunch: Brute Force vs. Locality-Sensitive Hashing for Cross-lingual Pairwise Similarity","authors":"Ferhan Ture, T. Elsayed, Jimmy J. Lin","doi":"10.1145/2009916.2010042","DOIUrl":"https://doi.org/10.1145/2009916.2010042","url":null,"abstract":"This work explores the problem of cross-lingual pairwise similarity, where the task is to extract similar pairs of documents across two different languages. Solutions to this problem are of general interest for text mining in the multilingual context and have specific applications in statistical machine translation. Our approach takes advantage of cross-language information retrieval (CLIR) techniques to project feature vectors from one language into another, and then uses locality-sensitive hashing (LSH) to extract similar pairs. We show that effective cross-lingual pairwise similarity requires working with similarity thresholds that are much lower than in typical monolingual applications, making the problem quite challenging. We present a parallel, scalable MapReduce implementation of the sort-based sliding window algorithm, which is compared to a brute-force approach on German and English Wikipedia collections. Our central finding can be summarized as“no free lunch”: there is no single optimal solution. Instead, we characterize effectivenessefficiency tradeoffs in the solution space, which can guide the developer to locate a desirable operating point based on applicationand resource-specific constraints.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"9 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132406559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 46

Evaluating multi-query sessions 评估多查询会话

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010056

E. Kanoulas, Ben Carterette, Paul D. Clough, M. Sanderson

引用次数: 88

Learning to rank under tight budget constraints 学习在预算紧张的情况下排名

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010105

Christian Pölitz, Ralf Schenkel

引用次数: 3

Modeling subset distributions for verbose queries 为详细查询建模子集分布

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010085

Xiaobing Xue, W. Bruce Croft

引用次数: 11

Rating prediction using feature words extracted from customer reviews 使用从客户评论中提取的特征词进行评级预测

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI: 10.1145/2009916.2010121

Masanao Ochi, Makoto Okabe, R. Onai

引用次数: 7