{"title":"Sentence length bias in TREC novelty track judgements","authors":"L. L. Bando, Falk Scholer, A. Turpin","doi":"10.1145/2407085.2407093","DOIUrl":"https://doi.org/10.1145/2407085.2407093","url":null,"abstract":"The Cranfield methodology for comparing document ranking systems has also been applied recently to comparing sentence ranking methods, which are used as pre-processors for summary generation methods. In particular, the TREC Novelty track data has been used to assess whether one sentence ranking system is better than another. This paper demonstrates that there is a strong bias in the Novelty track data for relevant sentences to also be longer sentences. Thus, systems that simply choose the longest sentences will often appear to perform better in terms of identifying \"relevant\" sentences than systems that use other methods. We demonstrate, by example, how this can lead to misleading conclusions about the comparative effectiveness of sentence ranking systems. We then demonstrate that if the Novelty track data is split into subcollections based on sentence length, comparing systems on each of the subcollections leads to conclusions that avoid the bias.","PeriodicalId":402985,"journal":{"name":"Australasian Document Computing Symposium","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131258941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pairwise similarity of TopSig document signatures","authors":"R. D. Vries, S. Geva","doi":"10.1145/2407085.2407103","DOIUrl":"https://doi.org/10.1145/2407085.2407103","url":null,"abstract":"This paper analyses the pairwise distances of signatures produced by the TopSig retrieval model on two document collections. The distribution of the distances are compared to purely random signatures. It explains why TopSig is only competitive with state of the art retrieval models at early precision. Only the local neighbourhood of the signatures is interpretable. We suggest this is a common property of vector space models.","PeriodicalId":402985,"journal":{"name":"Australasian Document Computing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124402458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Explaining difficulty navigating a website using page view data","authors":"Paul Thomas","doi":"10.1145/2407085.2407090","DOIUrl":"https://doi.org/10.1145/2407085.2407090","url":null,"abstract":"A user's behaviour on a web site can tell us something about that user's experience. In particular, we believe there are simple signals---including circling back to previous pages, and swapping out to a search engine---that indicate difficulty navigating a site.\u0000 Simple page view patterns from web server logs correlate with these signals and may explain them. Extracting these patterns can help web authors understand where, and why, their sites are confusing or hard to navigate.\u0000 We illustrate these ideas with data from almost a million sessions on a government website. In this case a small number of page view patterns are present in almost a third of difficult sessions, suggesting possible improvements to website language or design. We also introduce a tool for web authors, which makes this analysis available in the context of the site itself.","PeriodicalId":402985,"journal":{"name":"Australasian Document Computing Symposium","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130939554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A study in language identification","authors":"Rachel Mary Milne, Richard A. O'Keefe, A. Trotman","doi":"10.1145/2407085.2407097","DOIUrl":"https://doi.org/10.1145/2407085.2407097","url":null,"abstract":"Language identification is automatically determining the language that a previously unseen document was written in. We compared several prior methods on samples from the Wikipedia and the EuroParl collections. Most of these methods work well. But we identify that these (and presumably other document) collections are heterogeneous in size, and short documents are systematically different from large ones. That techniques that work well on long documents are different from those that work well on short ones. We believe that improvement in algorithms will be seen if length is taken into account.","PeriodicalId":402985,"journal":{"name":"Australasian Document Computing Symposium","volume":"289 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133915395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mike Symonds, P. Bruza, G. Zuccon, Laurianne Sitbon, I. Turner
{"title":"Is the unigram relevance model term independent?: classifying term dependencies in query expansion","authors":"Mike Symonds, P. Bruza, G. Zuccon, Laurianne Sitbon, I. Turner","doi":"10.1145/2407085.2407102","DOIUrl":"https://doi.org/10.1145/2407085.2407102","url":null,"abstract":"This paper develops a framework for classifying term dependencies in query expansion with respect to the role terms play in structural linguistic associations. The framework is used to classify and compare the query expansion terms produced by the unigram and positional relevance models. As the unigram relevance model does not explicitly model term dependencies in its estimation process it is often thought to ignore dependencies that exist between words in natural language.\u0000 The framework presented in this paper is underpinned by two types of linguistic association, namely syntagmatic and paradigmatic associations. It was found that syntagmatic associations were a more prevalent form of linguistic association used in query expansion. Paradoxically, it was the unigram model that exhibited this association more than the positional relevance model. This surprising finding has two potential implications for information retrieval models: (1) if linguistic associations underpin query expansion, then a probabilistic term dependence assumption based on position is inadequate for capturing them; (2) the unigram relevance model captures more term dependency information than its underlying theoretical model suggests, so its normative position as a baseline that ignores term dependencies should perhaps be reviewed.","PeriodicalId":402985,"journal":{"name":"Australasian Document Computing Symposium","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131432934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-aspect group formation using facility location analysis","authors":"Mahmood Neshati, H. Beigy, D. Hiemstra","doi":"10.1145/2407085.2407094","DOIUrl":"https://doi.org/10.1145/2407085.2407094","url":null,"abstract":"In this paper, we propose an optimization framework to retrieve an optimal group of experts to perform a given multi-aspect task/project. Each task needs a diverse set of skills and the group of assigned experts should be able to collectively cover all required aspects of the task. We consider three types of multi-aspect team formation problems and propose a unified framework to solve these problems accurately and efficiently. Our proposed framework is based on Facility Location Analysis (FLA) which is a well known branch of the Operation Research (OR). Our experiments on a real dataset show significant improvement in comparison with the state-of-the art approaches for the team formation problem.","PeriodicalId":402985,"journal":{"name":"Australasian Document Computing Symposium","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126672318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Khamsum Kinley, D. Tjondronegoro, Helen Partridge, Sylvia Lauretta Edwards
{"title":"Relationship between the nature of the search task types and query reformulation behaviour","authors":"Khamsum Kinley, D. Tjondronegoro, Helen Partridge, Sylvia Lauretta Edwards","doi":"10.1145/2407085.2407091","DOIUrl":"https://doi.org/10.1145/2407085.2407091","url":null,"abstract":"Success of query reformulation and relevant information retrieval depends on many factors, such as users' prior knowledge, age, gender, and cognitive styles. One of the important factors that affect a user's query reformulation behaviour is that of the nature of the search tasks. Limited studies have examined the impact of the search task types on query reformulation behaviour while performing Web searches. This paper examines how the nature of the search tasks affects users' query reformulation behaviour during information searching. The paper reports empirical results from a user study in which 50 participants performed a set of three Web search tasks -- exploratory, factorial and abstract. Users' interactions with search engines were logged by using a monitoring program. 872 unique search queries were classified into five query types -- New, Add, Remove, Replace and Repeat. Users submitted fewer queries for the factual task, which accounted for 26%. They completed a higher number of queries (40% of the total queries) while carrying out the exploratory task. A one-way MANOVA test indicated a significant effect of search task types on users' query reformulation behaviour. In particular, the search task types influenced the manner in which users reformulated the New and Repeat queries.","PeriodicalId":402985,"journal":{"name":"Australasian Document Computing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133047557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}