Jidong Wang, Hua-Jun Zeng, Zheng Chen, Hongjun Lu, Li Tao, Wei-Ying Ma
{"title":"ReCoM: reinforcement clustering of multi-type interrelated data objects","authors":"Jidong Wang, Hua-Jun Zeng, Zheng Chen, Hongjun Lu, Li Tao, Wei-Ying Ma","doi":"10.1145/860435.860486","DOIUrl":"https://doi.org/10.1145/860435.860486","url":null,"abstract":"Most existing clustering algorithms cluster highly related data objects such as Web pages and Web users separately. The interrelation among different types of data objects is either not considered, or represented by a static feature space and treated in the same ways as other attributes of the objects. In this paper, we propose a novel clustering approach for clustering multi-type interrelated data objects, ReCoM (Reinforcement Clustering of Multi-type Interrelated data objects). Under this approach, relationships among data objects are used to improve the cluster quality of interrelated data objects through an iterative reinforcement clustering process. At the same time, the link structure derived from relationships of the interrelated data objects is used to differentiate the importance of objects and the learned importance is also used in the clustering process to further improve the clustering results. Experimental results show that the proposed approach not only effectively overcomes the problem of data sparseness caused by the high dimensional relationship space but also significantly improves the clustering accuracy.","PeriodicalId":209809,"journal":{"name":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","volume":"163 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126063527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Building and applying a concept hierarchy representation of a user profile","authors":"Nikolaos Nanas, V. Uren, A. Roeck","doi":"10.1145/860435.860473","DOIUrl":"https://doi.org/10.1145/860435.860473","url":null,"abstract":"Term dependence is a natural consequence of language use. Its successful representation has been a long standing goal for Information Retrieval research. We present a methodology for the construction of a concept hierarchy that takes into account the three basic dimensions of term dependence. We also introduce a document evaluation function that allows the use of the concept hierarchy as a user profile for Information Filtering. Initial experimental results indicate that this is a promising approach for incorporating term dependence in the way documents are filtered.","PeriodicalId":209809,"journal":{"name":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126400827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Experimental result analysis for a generative probabilistic image retrieval model","authors":"T. Westerveld, A. D. Vries","doi":"10.1145/860435.860461","DOIUrl":"https://doi.org/10.1145/860435.860461","url":null,"abstract":"The main conclusion from the metrics-based evaluation of video retrieval systems at TREC's video track is that non-interactive image retrieval from general collections using visual information only is not yet feasible. We show how a detailed analysis of retrieval results -- looking beyond mean average precision (MAP) scores on topical relevance -- gives significant insight in the main problems with the visual part of the retrieval model under study. Such an analytical approach proves an important addition to standard evaluation measures.","PeriodicalId":209809,"journal":{"name":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125228120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Music modeling with random fields","authors":"V. Lavrenko, Jeremy Pickens","doi":"10.1145/860435.860515","DOIUrl":"https://doi.org/10.1145/860435.860515","url":null,"abstract":"Recent interest in the area of music information retrieval is exploding. However, very few of the existing music retrieval techniques take advantage of recent developments in statistical modeling. In this report we discuss an application of Random Fields to the problem of statistical modeling of polyphonic music. With such models in hand, the challenges of developing effective searching, browsing, and organization techniques for the growing bodies of music collections may be successfully met. 1 Polyphonic music can be thought of as a two-dimensional stochastic process. Unlike text, the musical vocabulary is relatively small, containing at most several hundred discrete note symbols. What makes music so fascinating and expressive is the very rich structure inherent in musical pieces. Whereas text samples can be reasonably modeled using simple unigram or bi-gram language models, polyphonic music is characterized by numerous periodic symmetries, repetitions, and overlapping shortand long-term interactions that are beyond the capabilities of simple Markov chains. Random Fields are a generalization of Markov chains to multidimensional spatial processes. They are incredibly flexible, allowing us to model arbitrary interactions between elements of data. Recently random fields have found applications in large-vocabulary tasks, such as language modeling and information extraction. One of the most influential works in the area is the 1997 publication of Della Pietra et al. [2], which outlined the algorithms used in parts of this paper. Berger et al. [1] were the first to suggest the use of maximum entropy models for natural language processing. While our work was inspired by applications of random fields to language processing, it bears more similarity to the use of the framework by the researchers in computer vision. In most natural language applications authors start with a reasonable set of features (which are usually single words, or hand-crafted expressions), and the main challenge is to optimize the weights corresponding to these features. This works well in natural language, where words bear significant semantic content. In our case, induction of the random field is the crucial step. We will use the techniques suggested by [2] to automatically induce new high-level, salient features, such as chords and melodic progressions.","PeriodicalId":209809,"journal":{"name":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","volume":"148 Pt 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126319246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the effectiveness of evaluating retrieval systems in the absence of relevance judgments","authors":"J. Aslam, R. Savell","doi":"10.1145/860435.860501","DOIUrl":"https://doi.org/10.1145/860435.860501","url":null,"abstract":"Soboroff, Nicholas and Cahan recently proposed a method for evaluating the performance of retrieval systems without relevance judgments. They demonstrated that the system evaluations produced by their methodology are correlated with actual evaluations using relevance judgments in the TREC competition. In this work, we propose an explanation for this phenomenon. We devise a simple measure for quantifying the similarity of retrieval systems by assessing the similarity of their retrieved results. Then, given a collection of retrieval systems and their retrieved results, we use this measure to assess the average similarity of a system to the other systems in the collection. We demonstrate that evaluating retrieval systems according to average similarity yields results quite similar to the methodology proposed by Soboroff et~al., and we further demonstrate that these two techniques are in fact highly correlated. Thus, the techniques are effectively evaluating and ranking retrieval systems by \"popularity\" as opposed to \"performance.","PeriodicalId":209809,"journal":{"name":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","volume":"151 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116628499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting query history for document ranking in interactive information retrieval","authors":"Xuehua Shen, ChengXiang Zhai","doi":"10.1145/860435.860509","DOIUrl":"https://doi.org/10.1145/860435.860509","url":null,"abstract":"In this poster,we incorporate user query history, as context information, to improve the retrieval performance in interactive retrieval. Experiments using the TREC data show that incorporating such context information indeed consistently improves the retrieval performance in both average precision and precision at 20 documents.","PeriodicalId":209809,"journal":{"name":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129497273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigating the relationship between language model perplexity and IR precision-recall measures","authors":"L. Azzopardi, M. Girolami, Keith van Risjbergen","doi":"10.1145/860435.860505","DOIUrl":"https://doi.org/10.1145/860435.860505","url":null,"abstract":"An empirical study has been conducted investigating the relationship between the performance of an aspect based language model in terms of perplexity and the corresponding information retrieval performance obtained. It is observed, on the corpora considered, that the perplexity of the language model has a systematic relationship with the achievable precision recall performance though it is not statistically significant.","PeriodicalId":209809,"journal":{"name":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133508659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing cross-language information retrieval by an automatic acquisition of bilingual terminology from comparable corpora","authors":"F. Sadat, Masatoshi Yoshikawa, Shunsuke Uemura","doi":"10.1145/860435.860519","DOIUrl":"https://doi.org/10.1145/860435.860519","url":null,"abstract":"This paper presents an approach to bilingual lexicon extraction from comparable corpora and evaluations on Cross-Language Information Retrieval. We explore a bi-directional extraction of bilingual terminology primarily from comparable corpora. A combined statistics-based and linguistics-based model to select best translation candidates to phrasal translation is proposed. Evaluations using a large test collection for Japanese-English revealed the proposed combination of bi-directional comparable corpora, bilingual dictionaries and transliteration, augmented with linguistics-based pruning to be highly effective in Cross-Language Information Retrieval.","PeriodicalId":209809,"journal":{"name":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129476996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Error analysis of difficult TREC topics","authors":"Xiao Hu, S. Bandhakavi, ChengXiang Zhai","doi":"10.1145/860435.860524","DOIUrl":"https://doi.org/10.1145/860435.860524","url":null,"abstract":"Given the experimental nature of information retrieval, progress critically depends on analyzing the errors made by existing retrieval approaches and understanding their limitations. Our research explores various hypothesized reasons for hard topics in TREC-8 ad hoc task, and shows that the bad performance is partially due to the existence of highly distracting sub-collections that can dominate the overall performance.","PeriodicalId":209809,"journal":{"name":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129821078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using terminological feedback for web search refinement: a log-based study","authors":"Peter G. Anick","doi":"10.1145/860435.860453","DOIUrl":"https://doi.org/10.1145/860435.860453","url":null,"abstract":"Although interactive query reformulation has been actively studied in the laboratory, little is known about the actual behavior of web searchers who are offered terminological feedback along with their search results. We analyze log sessions for two groups of users interacting with variants of the AltaVista search engine - a baseline group given no terminological feedback and a feedback group to whom twelve refinement terms are offered along with the search results. We examine uptake, refinement effectiveness, conditions of use, and refinement type preferences. Although our measure of overall session \"success\" shows no difference between outcomes for the two groups, we find evidence that a subset of those users presented with terminological feedback do make effective use of it on a continuing basis.","PeriodicalId":209809,"journal":{"name":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130059926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}