Australasian Document Computing Symposium最新文献_第2页

Classifying microblogs for disasters 对微博进行灾难分类

Australasian Document Computing Symposium Pub Date : 2013-12-05 DOI: 10.1145/2537734.2537737

Sarvnaz Karimi, Jie Yin, Cécile Paris

引用次数: 44

Australasian Document Computing Symposium Pub Date : 2013-12-05 DOI: 10.1145/2537734.2537746

Matt Crane, A. Trotman, Richard A. O'Keefe

引用次数: 1

ADCS reaches adulthood: an analysis of the conference and its community over the last eighteen years ADCS走向成年:对过去18年会议及其社区的分析

Australasian Document Computing Symposium Pub Date : 2013-12-05 DOI: 10.1145/2537734.2537741

B. Koopman, G. Zuccon, Lance De Vine, Aneesha Bakharia, P. Bruza, Laurianne Sitbon, Andrew Gibson

引用次数: 0

Integrated instance- and class-based generative modeling for text classification 集成了基于实例和类的文本分类生成建模

Australasian Document Computing Symposium Pub Date : 2013-12-05 DOI: 10.1145/2537734.2537751

Antti Puurula, Sung-Hyon Myaeng

{"title":"Integrated instance- and class-based generative modeling for text classification","authors":"Antti Puurula, Sung-Hyon Myaeng","doi":"10.1145/2537734.2537751","DOIUrl":"https://doi.org/10.1145/2537734.2537751","url":null,"abstract":"Statistical methods for text classification are predominantly based on the paradigm of class-based learning that associates class variables with features, discarding the instances of data after model training. This results in efficient models, but neglects the fine-grained information present in individual documents. Instance-based learning uses this information, but suffers from data sparsity with text data. In this paper, we propose a generative model called Tied Document Mixture (TDM) for extending Multinomial Naive Bayes (MNB) with mixtures of hierarchically smoothed models for documents. Alternatively, TDM can be viewed as a Kernel Density Classifier using class-smoothed Multinomial kernels. TDM is evaluated for classification accuracy on 14 different datasets for multi-label, multi-class and binary-class text classification tasks and compared to instance- and class-based learning baselines. The comparisons to MNB demonstrate a substantial improvement in accuracy as a function of available training documents per class, ranging up to average error reductions of over 26% in sentiment classification and 65% in spam classification. On average TDM is as accurate as the best discriminative classifiers, but retains the linear time complexities of instance-based learning methods, with exact algorithms for both model estimation and inference.","PeriodicalId":402985,"journal":{"name":"Australasian Document Computing Symposium","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130467382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Merging algorithms for enterprise search 企业搜索的合并算法

Australasian Document Computing Symposium Pub Date : 2013-12-05 DOI: 10.1145/2537734.2537750

Pengfei Li, Paul Thomas, D. Hawking

引用次数: 9

Efficient top-k retrieval with signatures 带签名的高效top-k检索

Australasian Document Computing Symposium Pub Date : 2013-12-05 DOI: 10.1145/2537734.2537742

Timothy Chappell, S. Geva, Anthony N. Nguyen, G. Zuccon

引用次数: 10

Choices in batch information retrieval evaluation 批量信息检索评价中的选择

Australasian Document Computing Symposium Pub Date : 2013-12-05 DOI: 10.1145/2537734.2537745

Falk Scholer, Alistair Moffat, Paul Thomas

{"title":"Choices in batch information retrieval evaluation","authors":"Falk Scholer, Alistair Moffat, Paul Thomas","doi":"10.1145/2537734.2537745","DOIUrl":"https://doi.org/10.1145/2537734.2537745","url":null,"abstract":"Web search tools are used on a daily basis by billions of people. The commercial providers of these services spend large amounts of money measuring their own effectiveness and benchmarking against their competitors; nothing less than their corporate survival is at stake. Techniques for offline or \"batch\" evaluation of search quality have received considerable attention, spanning ways of constructing relevance judgments; ways of using them to generate numeric scores; and ways of inferring system \"superiority\" from sets of such scores.\u0000 Our purpose in this paper is consider these mechanisms as a chain of inter-dependent activities, in order to explore some of the ramifications of alternative components. By disaggregating the different activities, and asking what the ultimate objective of the measurement process is, we provide new insights into evaluation approaches, and are able to suggest new combinations that might prove fruitful avenues for exploration. Our observations are examined with reference to data collected from a user study covering 34 users undertaking a total of six search tasks each, using two systems of markedly different quality.\u0000 We hope to encourage broader awareness of the many factors that go into an evaluation of search effectiveness, and of the implications of these choices, and encourage researchers to carefully report all aspects of the evaluation process when describing their system performance experiments.","PeriodicalId":402985,"journal":{"name":"Australasian Document Computing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131889517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Economic models of search 搜索的经济模型

Australasian Document Computing Symposium Pub Date : 2013-12-05 DOI: 10.1145/2537734.2537735

L. Azzopardi

{"title":"Economic models of search","authors":"L. Azzopardi","doi":"10.1145/2537734.2537735","DOIUrl":"https://doi.org/10.1145/2537734.2537735","url":null,"abstract":"Searching is inherently an interactive process usually requiring a number of queries to be submitted and a number of documents to be assessed in order to find the desired amount of relevant information. While numerous models of search have been proposed, they have been largely conceptual in nature providing a descriptive account of the search process. For example, Bates' Berry Picking metaphor aptly describes how information seekers forage for relevant information [4]. However it lacks any predictive or explanatory power. In this talk, I will outline how microeconomic theory can be applied to interactive information retrieval, where the search process can be viewed as a combination of inputs (i.e. queries and assessments) which are used to \"produce\" output (i.e. relevance). Under this view, it is possible to build models that not only describe the relationship between interaction, cost and gain, but also explain and predict behaviour. During the talk, I will run through a number of examples of how economics can explain different behaviours. For example, why PhD students should search more than their supervisors (using an economic model developed by Cooper [6]), why queries are short [1], why Boolean searchers need to explore more results, and why it is okay to look at the first few results when searching the web [2]. I shall then describe how the cost of different interactions affect search behaviour [3], before extending the current theory to include other variables (such as the time spent on the search result page, the interaction with snippets, etc) to create more sophisticated and realistic models. Essentially, I will argue that by using such models we can:\u0000 1. theorise and predict how users will behave when interacting with systems,\u0000 2. ascertain how the costs of different interaction will influence search behaviour,\u0000 3. understand why particular interaction styles, strategies, techniques are or are not adopted by users, and,\u0000 4. determine what interactions and functionalities are worthwhile based on their expected gain and associated costs.","PeriodicalId":402985,"journal":{"name":"Australasian Document Computing Symposium","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132027033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Graph-based concept weighting for medical information retrieval 基于图的医学信息检索概念加权

Australasian Document Computing Symposium Pub Date : 2012-12-05 DOI: 10.1145/2407085.2407096

B. Koopman, G. Zuccon, P. Bruza, Laurianne Sitbon, Michael Lawley

{"title":"Graph-based concept weighting for medical information retrieval","authors":"B. Koopman, G. Zuccon, P. Bruza, Laurianne Sitbon, Michael Lawley","doi":"10.1145/2407085.2407096","DOIUrl":"https://doi.org/10.1145/2407085.2407096","url":null,"abstract":"This paper presents a graph-based method to weight medical concepts in documents for the purposes of information retrieval. Medical concepts are extracted from free-text documents using a state-of-the-art technique that maps n-grams to concepts from the SNOMED CT medical ontology. In our graph-based concept representation, concepts are vertices in a graph built from a document, edges represent associations between concepts. This representation naturally captures dependencies between concepts, an important requirement for interpreting medical text, and a feature lacking in bag-of-words representations.\u0000 We apply existing graph-based term weighting methods to weight medical concepts. Using concepts rather than terms addresses vocabulary mismatch as well as encapsulates terms belonging to a single medical entity into a single concept. In addition, we further extend previous graph-based approaches by injecting domain knowledge that estimates the importance of a concept within the global medical domain.\u0000 Retrieval experiments on the TREC Medical Records collection show our method outperforms both term and concept baselines. More generally, this work provides a means of integrating background knowledge contained in medical ontologies into data-driven information retrieval approaches.","PeriodicalId":402985,"journal":{"name":"Australasian Document Computing Symposium","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117354339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Reordering an index to speed query processing without loss of effectiveness 重新排序索引以加快查询处理速度，同时不损失效率

Australasian Document Computing Symposium Pub Date : 2012-12-05 DOI: 10.1145/2407085.2407088

D. Hawking, Timothy Jones

{"title":"Reordering an index to speed query processing without loss of effectiveness","authors":"D. Hawking, Timothy Jones","doi":"10.1145/2407085.2407088","DOIUrl":"https://doi.org/10.1145/2407085.2407088","url":null,"abstract":"Following Long and Suel, we empirically investigate the importance of document order in search engines which rank documents using a combination of dynamic (query-dependent) and static (query-independent) scores, and use document-at-a-time (DAAT) processing. When inverted file postings are in collection order, assigning document numbers in order of descending static score supports lossless early termination while maintaining good compression.\u0000 Since static scores may not be available until all documents have been gathered and indexed, we build a tool for reordering an existing index and show that it operates in less than 20% of the original indexing time. We note that this additional cost is easily recouped by savings at query processing time. We compare best early-termination points for several different index orders on three enterprise search collections (a whole-of-government index with two very different query sets, and a collection from a UK university). We also present results for the same orders for ClueWeb09-CatB. Our evaluation focuses on finding results likely to be clicked on by users of Web or website search engines --- Nav and Key results in the TREC 2011 Web Track judging scheme.\u0000 The orderings tested are Original, Reverse, Random, and QIE (descending order of static score). For three enterprise search test sets we find that QIE order can achieve close-to-maximal search effectiveness with much lower computational cost than for other orderings. Additionally, reordering has negligible impact on compressed index size for indexes that contain position information. Our results for an artificial query set against the TREC ClueWeb09 Category B collection are much more equivocal and we canvass possible explanations for future investigation.","PeriodicalId":402985,"journal":{"name":"Australasian Document Computing Symposium","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115588017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9