{"title":"Boosting Titles does not Generally Improve Retrieval Effectiveness","authors":"Jimmy, G. Zuccon, B. Koopman","doi":"10.1145/3015022.3015028","DOIUrl":"https://doi.org/10.1145/3015022.3015028","url":null,"abstract":"The fields that compose structured documents such as web pages have been exploited to improve the effectiveness of information retrieval systems. Field-based retrieval methods assign different levels of importance (weights) to different fields, e.g., by boosting the score of a document when query terms are found in a specific field. An important question is how to decide which field should be boosted? It has been speculated that the title field should receive a higher weight. In this paper, we investigate whether boosting the title field of structured documents actually does improve retrieval effectiveness. Our results show that, on average, boosting titles does not improve retrieval effectiveness for field-based retrieval; this is both for ad-hoc web search and exploratory-based web search tasks. However, we do find that the boosting of titles does generally improve retrieval effectiveness for navigational queries and a small subset of ad-hoc queries. This result advocates for adaptive methods that selectively adjust boosting of specific fields based on the query.","PeriodicalId":334601,"journal":{"name":"Proceedings of the 21st Australasian Document Computing Symposium","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125484041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sheng Wang, Z. Bao, J. Culpepper, T. Sellis, M. Sanderson, Munkh-Erdene Yadamjav
{"title":"Interactive Trip Planning Using Activity Trajectories","authors":"Sheng Wang, Z. Bao, J. Culpepper, T. Sellis, M. Sanderson, Munkh-Erdene Yadamjav","doi":"10.1145/3015022.3015030","DOIUrl":"https://doi.org/10.1145/3015022.3015030","url":null,"abstract":"We present an interactive trip planning system called @FINDER which uses an exemplar trajectory query to find the most related top-k spatial-textual trajectories. @FINDER is implemented to support various degrees of user information needs for trip planning. For users with zero knowledge about places to travel, @FINDER provides a heatmap of popular points of interest (POIs) as well as popular activities from a trajectory database. The system helps users quickly explore the places, and helps formulate an exemplar trajectory query, which specifies preferred places to go and activities of interest. Then @FINDER provides efficient query processing of the top-k related spatial-textual trajectories using a new approach to spatial-textual trajectory indexing recently developed at RMIT University. For each of the top-k results found in the form of a set of POIs and activities, @FINDER further computes the optimal route (in term of the travel time) covering all of the POIs, and returns an album to the user. Lastly, users can further interact with @FINDER by adding or deleting POIs/activities in the original exemplar query, and the system will update the results in a timely manner.","PeriodicalId":334601,"journal":{"name":"Proceedings of the 21st Australasian Document Computing Symposium","volume":"288 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133678259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dinesha Chathurani Nanayakkara Wasam Uluwitige, S. Geva, G. Zuccon, V. Chandran, Timothy Chappell
{"title":"Effective User Relevance Feedback for Image Retrieval with Image Signatures","authors":"Dinesha Chathurani Nanayakkara Wasam Uluwitige, S. Geva, G. Zuccon, V. Chandran, Timothy Chappell","doi":"10.1145/3015022.3015034","DOIUrl":"https://doi.org/10.1145/3015022.3015034","url":null,"abstract":"Content-based image retrieval (CBIR) has attracted much attention due to the exponential growth of digital image collections that have become available in recent years. Relevance feedback (RF) in the context of search engines is a query expansion technique, which is based on relevance judgments about the top results that are initially returned for a given query. RF can be obtained directly from end users, inferred indirectly from user interactions with a result list, or even assumed (aka pseudo relevance feedback). RF information is used to generate a new query, aiming to re-focus the query towards more relevant results. This paper presents a methodology for use of signature based image retrieval with a user in the loop to improve retrieval performance. The significance of this study is twofold. First, it shows how to effectively use explicit RF with signature based image retrieval to improve retrieval quality and efficiency. Second, this approach provides a mechanism for end users to refine their image queries. This is an important contribution because, to date, there is no effective way to reformulate an image query; our approach provides a solution to this problem. Empirical experiments have been carried out to study the behaviour and optimal parameter settings of this approach. Empirical evaluations based on standard benchmarks demonstrate the effectiveness of the proposed approach in improving the performance of CBIR in terms of recall, precision, speed and scalability.","PeriodicalId":334601,"journal":{"name":"Proceedings of the 21st Australasian Document Computing Symposium","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128501703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Yulianti, Ruey-Cheng Chen, Falk Scholer, M. Sanderson
{"title":"Using Semantic and Context Features for Answer Summary Extraction","authors":"E. Yulianti, Ruey-Cheng Chen, Falk Scholer, M. Sanderson","doi":"10.1145/3015022.3015031","DOIUrl":"https://doi.org/10.1145/3015022.3015031","url":null,"abstract":"We investigate the effectiveness of using semantic and context features for extracting document summaries that are designed to contain answers for non-factoid queries. The summarization methods are compared against state-of-the-art factoid question answering and query-biased summarization techniques. The accuracy of generated answer summaries are evaluated using ROUGE as well as sentence ranking measures, and the relationship between these measures are further analyzed. The results show that semantic and context features give significant improvement to the state-of-the-art techniques.","PeriodicalId":334601,"journal":{"name":"Proceedings of the 21st Australasian Document Computing Symposium","volume":"297 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114441771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"In Vacuo and In Situ Evaluation of SIMD Codecs","authors":"A. Trotman, Jimmy J. Lin","doi":"10.1145/3015022.3015023","DOIUrl":"https://doi.org/10.1145/3015022.3015023","url":null,"abstract":"The size of a search engine index and the time to search are inextricably related through the compression codec. This investigation examines this tradeoff using several relatively unexplored SIMD-based codecs including QMX, TurboPackV, and TurboPFor. It uses (the non-SIMD) OPTPFor as a baseline. Four new variants of QMX are introduced and also compared. Those variants include optimizations for space and for time. Experiments were conducted on the TREC .gov2 collection using topics 701-850, in crawl order and in URL order. The results suggest that there is very little difference between these codecs, but that the reference implementation of QMX performs well.","PeriodicalId":334601,"journal":{"name":"Proceedings of the 21st Australasian Document Computing Symposium","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124456345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 21st Australasian Document Computing Symposium","authors":"Sarvnaz Karimi, Mark James Carman","doi":"10.1145/3015022","DOIUrl":"https://doi.org/10.1145/3015022","url":null,"abstract":"These proceedings contain the papers presented at ADCS 2016, the Twenty First Australasian Document Computing Symposium, hosted by Monash University and held in Caulfield, VIC, Australia.","PeriodicalId":334601,"journal":{"name":"Proceedings of the 21st Australasian Document Computing Symposium","volume":"517 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123569730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Occupational Representativeness in Twitter","authors":"S. Kim, Stephen Wan, Cécile Paris","doi":"10.1145/3015022.3015036","DOIUrl":"https://doi.org/10.1145/3015022.3015036","url":null,"abstract":"This paper describes an approach to detect one particular demographic characteristic, occupation (or profession) in Twitter user profiles. In this paper, we show how effective the approach is for estimating occupational population statistics in Australian Twitter by correlating them with real-world population obtained from 2011 Australian census data. We also demonstrate that we can gain more reliable social media insights in the context of occupational representativeness in Twitter if a non-standard occupation name is mapped into a standard occupation name. To our knowledge, this is the first attempt to build a machine learning model that automatically identifies linguistically noisy or open-ended occupations in Twitter, resulting in more reliable occupational population.","PeriodicalId":334601,"journal":{"name":"Proceedings of the 21st Australasian Document Computing Symposium","volume":"164 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115146364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Query-Biased Summaries for Tabular Data","authors":"Vincent Au, Paul Thomas, Gaya K. Jayasinghe","doi":"10.1145/3015022.3015027","DOIUrl":"https://doi.org/10.1145/3015022.3015027","url":null,"abstract":"Government, research, and academic data portals publish a large amount of public data, but present tools make discovery difficult. In particular, search results do not support a user's decision whether or not to commit to a download of what might be a large data set. We describe a method for producing query-biased summaries of tabular data, which aims to support a user's download decision-or even to answer the question on the spot, with no further interaction. The method infers simple types in the data and query; automatically refines queries, where that makes sense; extracts relevant subsets of the complete table; and generates both graphical and tabular summaries of what remains. A small-scale user study suggests this both helps users identify useful results (fewer false negatives), and reduces wasted downloads (fewer false positives).","PeriodicalId":334601,"journal":{"name":"Proceedings of the 21st Australasian Document Computing Symposium","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127003821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Uncertainty in Rank-Biased Precision","authors":"L. Park","doi":"10.1145/3015022.3015029","DOIUrl":"https://doi.org/10.1145/3015022.3015029","url":null,"abstract":"Information retrieval metrics that provide uncertainty intervals when faced with unjudged documents, such as Rank-Biased Precision (RBP), provide us with an indication of the upper and lower bound of the system score. Unfortunately, the uncertainty is disregarded when examining the mean over a set of queries. In this article, we examine the distribution of the uncertainty per query and averaged over all queries, under the assumption that each unjudged document has the same probability of being relevant. We also derive equations for the mean, variance, and distribution of Mean RBP uncertainty. Finally, the impact of our assumption is assessed using simulation. We find that by removing the assumption of equal probability of relevance, we obtain a scaled form of the previously defined mean and standard deviation for the distribution of Mean RBP uncertainty.","PeriodicalId":334601,"journal":{"name":"Proceedings of the 21st Australasian Document Computing Symposium","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128429652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Cutoff Prediction in Multi-Stage Retrieval Systems","authors":"J. Culpepper, C. Clarke, Jimmy J. Lin","doi":"10.1145/3015022.3015026","DOIUrl":"https://doi.org/10.1145/3015022.3015026","url":null,"abstract":"Modern multi-stage retrieval systems are comprised of a candidate generation stage followed by one or more reranking stages. In such an architecture, the quality of the final ranked list may not be sensitive to the quality of the initial candidate pool, especially in terms of early precision. This provides several opportunities to increase retrieval efficiency without significantly sacrificing effectiveness. In this paper, we explore a new approach to dynamically predicting the size of an initial result set in the candidate generation stage, which can directly affect the overall efficiency and effectiveness of the entire system. Previous work exploring this tradeoff has focused on global parameter settings that apply to all queries, even though optimal settings vary across queries. In contrast, we propose a technique that makes a parameter prediction to maximize efficiency within an effectiveness envelope on a per query basis, using only static pre-retrieval features. Experimental results show that substantial efficiency gains are achievable. In addition, our framework provides a versatile tool that can be used to estimate the effectiveness-efficiency tradeoffs that are possible before selecting and tuning algorithms to make machine-learned predictions.","PeriodicalId":334601,"journal":{"name":"Proceedings of the 21st Australasian Document Computing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130901652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}