{"title":"Improving QA retrieval using document priors","authors":"J. Mayfield, Paul McNamee","doi":"10.1145/1148170.1148313","DOIUrl":"https://doi.org/10.1145/1148170.1148313","url":null,"abstract":"We present a simple way to improve document retrieval for question answering systems. The method biases the retrieval system toward documents that contain words that have appeared in other documents containing answers to the same type of question. The method works with virtually any retrieval system, and exhibits a statistically significant performance improvement over a strong baseline.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125098247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using historical data to enhance rank aggregation","authors":"Miriam Fernández, D. Vallet, P. Castells","doi":"10.1145/1148170.1148296","DOIUrl":"https://doi.org/10.1145/1148170.1148296","url":null,"abstract":"Rank aggregation is a pervading operation in IR technology. We hypothesize that the performance of score-based aggregation may be affected by artificial, usually meaningless deviations consistently occurring in the input score distributions, which distort the combined result when the individual biases differ from each other. We propose a score-based rank aggregation model where the source scores are normalized to a common distribution before being combined. Early experiments on available data from several TREC collections are shown to support our proposal.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125112083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing topic tracking with temporal information","authors":"Baoli Li, Wenjie Li, Q. Lu","doi":"10.1145/1148170.1148308","DOIUrl":"https://doi.org/10.1145/1148170.1148308","url":null,"abstract":"In this paper, we propose a new strategy with time granularity reasoning for utilizing temporal information in topic tracking. Compared with previous ones, our work has four distinguished characteristics. Firstly, we try to determine a set of topic times for a target topic from the given on-topic stories. It helps to avoid the negative influence from other irrelevant times. Secondly, we take into account time granularity variance when deciding whether a coreference relationship exists between two times. Thirdly, both publication time and times presented in texts are considered. Finally, as time is only one attribute of a topic, we increase the similarity between a story and a target topic only when they are related not only temporally but also semantically. Experiments on two TDT corpora show that our method makes good use of temporal information in news stories.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116186474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semantic term matching in axiomatic approaches to information retrieval","authors":"Hui Fang, ChengXiang Zhai","doi":"10.1145/1148170.1148193","DOIUrl":"https://doi.org/10.1145/1148170.1148193","url":null,"abstract":"A common limitation of many retrieval models, including the recently proposed axiomatic approaches, is that retrieval scores are solely based on exact (i.e., syntactic) matching of terms in the queries and documents, without allowing distinct but semantically related terms to match each other and contribute to the retrieval score. In this paper, we show that semantic term matching can be naturally incorporated into the axiomatic retrieval model through defining the primitive weighting function based on a semantic similarity function of terms. We define several desirable retrieval constraints for semantic term matching and use such constraints to extend the axiomatic model to directly support semantic term matching based on the mutual information of terms computed on some document set. We show that such extension can be efficiently implemented as query expansion. Experiment results on several representative data sets show that, with mutual information computed over the documents in either the target collection for retrieval or an external collection such as the Web, our semantic expansion consistently and substantially improves retrieval accuracy over the baseline axiomatic retrieval model. As a pseudo feedback method, our method also outperforms a state-of-the-art language modeling feedback method.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128511114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Argamon, G. Agam, O. Frieder, D. Grossman, D. Lewis, G. Sohn, E. Voorhees
{"title":"A complex document information processing prototype","authors":"S. Argamon, G. Agam, O. Frieder, D. Grossman, D. Lewis, G. Sohn, E. Voorhees","doi":"10.1145/1148170.1148274","DOIUrl":"https://doi.org/10.1145/1148170.1148274","url":null,"abstract":"We developed a prototype for integrated retrieval and aggregation of diverse information contained in scanned paper documents. Such complex document information processing combines several forms of image processing together with textual/linguistic processing to enable effective analysis of complex document collections, a necessity for a wide range of applications. This is the first system to attempt integrated retrieval from complex documents; we report its current capabilities.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123332015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Irina Matveeva, C. Burges, Timo Burkard, Andy Laucius, Leon Wong
{"title":"High accuracy retrieval with multiple nested ranker","authors":"Irina Matveeva, C. Burges, Timo Burkard, Andy Laucius, Leon Wong","doi":"10.1145/1148170.1148246","DOIUrl":"https://doi.org/10.1145/1148170.1148246","url":null,"abstract":"High precision at the top ranks has become a new focus of research in information retrieval. This paper presents the multiple nested ranker approach that improves the accuracy at the top ranks by iteratively re-ranking the top scoring documents. At each iteration, this approach uses the RankNet learning algorithm to re-rank a subset of the results. This splits the problem into smaller and easier tasks and generates a new distribution of the results to be learned by the algorithm. We evaluate this approach using different settings on a data set labeled with several degrees of relevance. We use the normalized discounted cumulative gain (NDCG) to measure the performance because it depends not only on the position but also on the relevance score of the document in the ranked list. Our experiments show that making the learning algorithm concentrate on the top scoring results improves precision at the top ten documents in terms of the NDCG score.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"213 Suppl 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114372079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Lewis, G. Agam, S. Argamon, O. Frieder, D. Grossman, J. Heard
{"title":"Building a test collection for complex document information processing","authors":"D. Lewis, G. Agam, S. Argamon, O. Frieder, D. Grossman, J. Heard","doi":"10.1145/1148170.1148307","DOIUrl":"https://doi.org/10.1145/1148170.1148307","url":null,"abstract":"Research and development of information access technology for scanned paper documents has been hampered by the lack of public test collections of realistic scope and complexity. As part of a project to create a prototype system for search and mining of masses of document images, we are assembling a 1.5 terabyte dataset to support evaluation of both end-to-end complex document information processing (CDIP) tasks (e.g., text retrieval and data mining) as well as component technologies such as optical character recognition (OCR), document structure analysis, signature matching, and authorship attribution.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125724499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The effect of OCR errors on stylistic text classification","authors":"S. Stein, S. Argamon, O. Frieder","doi":"10.1145/1148170.1148325","DOIUrl":"https://doi.org/10.1145/1148170.1148325","url":null,"abstract":"Recently, interest is growing in non-topical text classification tasks such as genre classification, sentiment analysis, and authorship profiling. We study to what extent OCR errors affect stylistic text classification from scanned documents. We find that even a relatively high level of errors in the OCRed documents does not substantially affect stylistic classification accuracy.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132355377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jennifer Chu-Carroll, J. Prager, Krzysztof Czuba, D. Ferrucci, Pablo Duboue
{"title":"Semantic search via XML fragments: a high-precision approach to IR","authors":"Jennifer Chu-Carroll, J. Prager, Krzysztof Czuba, D. Ferrucci, Pablo Duboue","doi":"10.1145/1148170.1148247","DOIUrl":"https://doi.org/10.1145/1148170.1148247","url":null,"abstract":"In some IR applications, it is desirable to adopt a high precision search strategy to return a small set of documents that are highly focused and relevant to the user's information need. With these applications in mind, we investigate semantic search using the XML Fragments query language on text corpora automatically pre-processed to encode semantic information useful for retrieval. We identify three XML Fragment operations that can be applied to a query to conceptualize, restrict, or relate terms in the query. We demonstrate how these operations can be used to address four different query-time semantic needs: to specify target information type, to disambiguate keywords, to specify search term context, or to relate select terms in the query. We demonstrate the effectiveness of our semantic search technology through a series of experiments using the two applications in which we embed this technology and show that it yields significant improvement in precision in the search results.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130050622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning user interaction models for predicting web search result preferences","authors":"Eugene Agichtein, Eric Brill, S. Dumais, R. Ragno","doi":"10.1145/1148170.1148175","DOIUrl":"https://doi.org/10.1145/1148170.1148175","url":null,"abstract":"Evaluating user preferences of web search results is crucial for search engine development, deployment, and maintenance. We present a real-world study of modeling the behavior of web search users to predict web search result preferences. Accurate modeling and interpretation of user behavior has important applications to ranking, click spam detection, web search personalization, and other tasks. Our key insight to improving robustness of interpreting implicit feedback is to model query-dependent deviations from the expected \"noisy\" user behavior. We show that our model of clickthrough interpretation improves prediction accuracy over state-of-the-art clickthrough methods. We generalize our approach to model user behavior beyond clickthrough, which results in higher preference prediction accuracy than models based on clickthrough information alone. We report results of a large-scale experimental evaluation that show substantial improvements over published implicit feedback interpretation methods.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134007456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}