{"title":"Session details: Applications II","authors":"David D. Lewis","doi":"10.1145/3254389","DOIUrl":"https://doi.org/10.1145/3254389","url":null,"abstract":"","PeriodicalId":378368,"journal":{"name":"Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123927447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Query recovery of short user queries: on query expansion with stopwords","authors":"Johannes Leveling, G. Jones","doi":"10.1145/1835449.1835589","DOIUrl":"https://doi.org/10.1145/1835449.1835589","url":null,"abstract":"User queries to search engines are observed to predominantly contain inflected content words but lack stopwords and capitalization. Thus, they often resemble natural language queries after case folding and stopword removal. Query recovery aims to generate a linguistically well-formed query from a given user query as input to provide natural language processing tasks and cross-language information retrieval (CLIR). The evaluation of query translation shows that translation scores (NIST and BLEU) decrease after case folding, stopword removal, and stemming. A baseline method for query recovery reconstructs capitalization and stopwords, which considerably increases translation scores and significantly increases mean average precision for a standard CLIR task.","PeriodicalId":378368,"journal":{"name":"Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval","volume":"54 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116333452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proximity-based opinion retrieval","authors":"Shima Gerani, Mark James Carman, F. Crestani","doi":"10.1145/1835449.1835517","DOIUrl":"https://doi.org/10.1145/1835449.1835517","url":null,"abstract":"Blog post opinion retrieval aims at finding blog posts that are relevant and opinionated about a user's query. In this paper we propose a simple probabilistic model for assigning relevant opinion scores to documents. The key problem is how to capture opinion expressions in the document, that are related to the query topic. Current solutions enrich general opinion lexicons by finding query-specific opinion lexicons using pseudo-relevance feedback on external corpora or the collection itself. In this paper we use a general opinion lexicon and propose using proximity information in order to capture opinion term relatedness to the query. We propose a proximity-based opinion propagation method to calculate the opinion density at each point in a document. The opinion density at the position of a query term in the document can then be considered as the probability of opinion about the query term at that position. The effect of different kernels for capturing the proximity is also discussed. Experimental results on the BLOG06 dataset show that the proposed method provides significant improvement over standard TREC baselines and achieves a 2.5% increase in MAP over the best performing run in the TREC 2008 blog track.","PeriodicalId":378368,"journal":{"name":"Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125760938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"User centered story tracking","authors":"Ilija Subasic","doi":"10.1145/1835449.1835693","DOIUrl":"https://doi.org/10.1145/1835449.1835693","url":null,"abstract":"Using data collections available on the Internet has for many people became the main medium for staying informed about the world. Many of these collections are in nature dynamic, evolving as the subjects they describe change. The goal of different research areas is to identify and highlight these changes to better enable readers to track stories. In this work we restrict ourselves to news collections and investigate \"real-life\" effectiveness and usability of temporal text mining (TTM) story tracking methods. We propose a new story tracking method and build a tool to support it. Additionally, we investigate the effectiveness and usability of story tracking methods and define a new frameworks for automatic and user oriented evaluation. We built methods and tools which allow for understanding, discovery, and search through user interaction. Although there are many TTM methods developed there is a lack of common evaluation procedure. Therefore, we propose an evaluation framework for measuring how different TTM methods discover novel \"facts\". Apart from the automatic evaluation we are interested in how can users interact with pattens and learn about the underlying subjects of the story they track. For this purpose we propose a user testing environment that measures speed and accuracy in which users can use story tracking methods to discover predefined sets of ground-truth sentences.","PeriodicalId":378368,"journal":{"name":"Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122239503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Clustering II","authors":"Omar Alonso","doi":"10.1145/3254371","DOIUrl":"https://doi.org/10.1145/3254371","url":null,"abstract":"","PeriodicalId":378368,"journal":{"name":"Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134598810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dzung Hong, Luo Si, Paul J. Bracke, M. Witt, Timothy C Juchcinski
{"title":"A joint probabilistic classification model for resource selection","authors":"Dzung Hong, Luo Si, Paul J. Bracke, M. Witt, Timothy C Juchcinski","doi":"10.1145/1835449.1835468","DOIUrl":"https://doi.org/10.1145/1835449.1835468","url":null,"abstract":"Resource selection is an important task in Federated Search to select a small number of most relevant information sources. Current resource selection algorithms such as GlOSS, CORI, ReDDE, Geometric Average and the recent classification-based method focus on the evidence of individual information sources to determine the relevance of available sources. Current algorithms do not model the important relationship information among individual sources. For example, an information source tends to be relevant to a user query if it is similar to another source with high probability of being relevant. This paper proposes a joint probabilistic classification model for resource selection. The model estimates the probability of relevance of information sources in a joint manner by considering both the evidence of individual sources and their relationship. An extensive set of experiments have been conducted on several datasets to demonstrate the advantage of the proposed model.","PeriodicalId":378368,"journal":{"name":"Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval","volume":"29 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131894862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust audio identification for MP3 popular music","authors":"Wei Li, Yaduo Liu, X. Xue","doi":"10.1145/1835449.1835554","DOIUrl":"https://doi.org/10.1145/1835449.1835554","url":null,"abstract":"Audio identification via fingerprint has been an active research field with wide applications for years. Many technical papers were published and commercial software systems were also employed. However, most of these previously reported methods work on the raw audio format in spite of the fact that nowadays compressed format audio, especially MP3 music, has grown into the dominant way to store on personal computers and transmit on the Internet. It would be interesting if a compressed unknown audio fragment is able to be directly recognized from the database without the fussy and time-consuming decompression-identification-recompression procedure. So far, very few algorithms run directly in the compressed domain for music information retrieval, and most of them take advantage of MDCT coefficients or derived energy type of features. As a first attempt, we propose in this paper utilizing compressed-domain spectral entropy as the audio feature to implement a novel audio fingerprinting algorithm. The compressed songs stored in a music database and the possibly distorted compressed query excerpts are first partially decompressed to obtain the MDCT coefficients as the intermediate result. Then by grouping granules into longer blocks, remapping the MDCT coefficients into 192 new frequency lines to unify the frequency distribution of long and short windows, and defining 9 new subbands which cover the main frequency bandwidth of popular songs in accordance with the scale-factor bands of short windows, we calculate the spectral entropy of all consecutive blocks and come to the final fingerprint sequence by means of magnitude relationship modeling. Experiments show that such fingerprints exhibit strong robustness against various audio signal distortions like recompression, noise interference, echo addition, equalization, band-pass filtering, pitch shifting, and slight time-scale modification etc. For 5s-long query examples which might be severely degraded, an average top-five retrieval precision rate of more than 90% can be obtained in our test data set composed of 1822 popular songs.","PeriodicalId":378368,"journal":{"name":"Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132776262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuanzhe Cai, Miao Zhang, C. Ding, Sharma Chakravarthy
{"title":"Closed form solution of similarity algorithms","authors":"Yuanzhe Cai, Miao Zhang, C. Ding, Sharma Chakravarthy","doi":"10.1145/1835449.1835577","DOIUrl":"https://doi.org/10.1145/1835449.1835577","url":null,"abstract":"Algorithms defining similarities between objects of an information network are important of many IR tasks. SimRank algorithm and its variations are popularly used in many applications. Many fast algorithms are also developed. In this note, we first reformulate them as random walks on the network and express them using forward and backward transition probably in a matrix form. Second, we show that P-Rank (SimRank is only the special case of P-Rank) has a unique solution of eeT when decay factor c is equal to 1. We also show that SimFusion algorithm is a special case of P-Rank algorithm and prove that the similarity matrix of SimFusion is the product of PageRank vector. Our experiments on the web datasets show that for P-Rank the decay factor c doesn't seriously affect the similarity accuracy and accuracy of P-Rank is also higher than SimFusion and SimRank.","PeriodicalId":378368,"journal":{"name":"Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133258316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Niranjan Balasubramanian, G. Kumaran, Vitor R. Carvalho
{"title":"Exploring reductions for long web queries","authors":"Niranjan Balasubramanian, G. Kumaran, Vitor R. Carvalho","doi":"10.1145/1835449.1835545","DOIUrl":"https://doi.org/10.1145/1835449.1835545","url":null,"abstract":"Long queries form a difficult, but increasingly important segment for web search engines. Query reduction, a technique for dropping unnecessary query terms from long queries, improves performance of ad-hoc retrieval on TREC collections. Also, it has great potential for improving long web queries (upto 25% improvement in NDCG@5). However, query reduction on the web is hampered by the lack of accurate query performance predictors and the constraints imposed by search engine architectures and ranking algorithms. In this paper, we present query reduction techniques for long web queries that leverage effective and efficient query performance predictors. We propose three learning formulations that combine these predictors to perform automatic query reduction. These formulations enable trading of average improvements for the number of queries impacted, and enable easy integration into the search engine's architecture for rank-time query reduction. Experiments on a large collection of long queries issued to a commercial search engine show that the proposed techniques significantly outperform baselines, with more than 12% improvement in NDCG@5 in the impacted set of queries. Extension to the formulations such as result interleaving further improves results. We find that the proposed techniques deliver consistent retrieval gains where it matters most: poorly performing long web queries.","PeriodicalId":378368,"journal":{"name":"Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval","volume":"374 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115990497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Applications I","authors":"Luo Si","doi":"10.1145/3254367","DOIUrl":"https://doi.org/10.1145/3254367","url":null,"abstract":"","PeriodicalId":378368,"journal":{"name":"Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116852398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}