Proceedings of the 26th Australasian Document Computing Symposium最新文献

筛选
英文 中文
The Task: Distinguishing Tasks and Sessions in Legal Information Retrieval 任务:区分法律信息检索中的任务和会话
Proceedings of the 26th Australasian Document Computing Symposium Pub Date : 2022-12-15 DOI: 10.1145/3572960.3572983
G. Wiggers, G. Zuccon
{"title":"The Task: Distinguishing Tasks and Sessions in Legal Information Retrieval","authors":"G. Wiggers, G. Zuccon","doi":"10.1145/3572960.3572983","DOIUrl":"https://doi.org/10.1145/3572960.3572983","url":null,"abstract":"Legal information retrieval (IR) is a form of professional search often associated with high recall. Information seeking in this context can consist of a single query with no clicks (known as updating behaviour), a literature review where a complex boolean query crafted over several iterations is performed and all documents returned are inspected, or a seeking task spanning days or weeks, consisting of multiple queries interleaved with other tasks. Analysis of query logs is paramount to the improvement of current legal IR systems, and in particular of the system we are associated with, the Dutch Legal Intelligence IR system. This analysis however requires the ability to automatically identify which queries of a user are related to the same search goal — or in other words, related to the same search task. The current practice of defining sessions — a set of user interactions with the IR system with no more than 30 minutes between user actions — and equating a session to representing a search task, might prove ineffective given the characteristics of this user group. In this paper we provide an initial analysis of a sub-set of the query log from the Dutch Legal Intelligence IR system, comprising of 970 queries issued by 10 users within the space of 1 year. From this query log, we used the 30-minutes heuristic to define sessions, and extract 126 sessions, ranging from 1 to 71 sessions per user. We then independently annotate the query log to manually identify search tasks: this activity leads to the identification of 55 tasks, ranging from 1 to 21 tasks per user. In doing this, we highlight how the currently employed heuristic is not adequate to extract search queries from a user that are related to the same search task. We also show why tasks are more informative than sessions with regards to legal information retrieval. We further describe the potential of using characteristics such as Levenshtein distance, common words and string matching for automated task classification.","PeriodicalId":106265,"journal":{"name":"Proceedings of the 26th Australasian Document Computing Symposium","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132071204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robustness of Neural Rankers to Typos: A Comparative Study 神经排序器对错别字的鲁棒性比较研究
Proceedings of the 26th Australasian Document Computing Symposium Pub Date : 2022-12-15 DOI: 10.1145/3572960.3572981
Shengyao Zhuang, Xinyu Mao, G. Zuccon
{"title":"Robustness of Neural Rankers to Typos: A Comparative Study","authors":"Shengyao Zhuang, Xinyu Mao, G. Zuccon","doi":"10.1145/3572960.3572981","DOIUrl":"https://doi.org/10.1145/3572960.3572981","url":null,"abstract":"Recent advances in passage retrieval have seen the introduction of pre-trained language models (PLMs) based neural rankers. While generally very effective, little attention has been paid to the robustness of these rankers. In this paper, we study the effectiveness of state-of-the-art PLM rankers in presence of typos in queries, as an indication of the rankers’ robustness. As of PLM rankers, we consider the two most promising directions explored in previous work: dense retrievers vs. sparse retrievers. We find that both types of rankers are very sensitive to queries with typos. We then apply an existing augmentation-based typos-aware training technique with the aim of creating typo-robust dense and sparse retrievers. We find that this simple technique only works for dense retrievers, while it hurts effectiveness when used on sparse retrievers.","PeriodicalId":106265,"journal":{"name":"Proceedings of the 26th Australasian Document Computing Symposium","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121966468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Immediate-Access Indexing Using Space-Efficient Extensible Arrays 使用空间高效的可扩展数组的即时访问索引
Proceedings of the 26th Australasian Document Computing Symposium Pub Date : 2022-12-15 DOI: 10.1145/3572960.3572984
Alistair Moffat
{"title":"Immediate-Access Indexing Using Space-Efficient Extensible Arrays","authors":"Alistair Moffat","doi":"10.1145/3572960.3572984","DOIUrl":"https://doi.org/10.1145/3572960.3572984","url":null,"abstract":"The array is a fundamental data object in most programs. Its key functionality – storage of and access to a set of same-type elements in O(1) time per operation – is also widely employed in other more sophisticated data structures. In an extensible array the number of elements in the set is unknown at the time the program is initiated, and the array might continue to grow right through the program’s execution. In this paper we explore the use of extensible arrays in connection with the task of inverted index construction. We develop and test a space-efficient extensible array arrangement that has been previously described but not to our knowledge employed in practice, and show that it adds considerable flexibility to the index construction process while incurring only modest run-time overheads as a result of access indirections.","PeriodicalId":106265,"journal":{"name":"Proceedings of the 26th Australasian Document Computing Symposium","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116114203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Neural Rankers for Effective Screening Prioritisation in Medical Systematic Review Literature Search 医学系统评价文献检索中有效筛选优先级的神经排序方法
Proceedings of the 26th Australasian Document Computing Symposium Pub Date : 2022-12-15 DOI: 10.1145/3572960.3572980
Shuai Wang, Harrisen Scells, B. Koopman, G. Zuccon
{"title":"Neural Rankers for Effective Screening Prioritisation in Medical Systematic Review Literature Search","authors":"Shuai Wang, Harrisen Scells, B. Koopman, G. Zuccon","doi":"10.1145/3572960.3572980","DOIUrl":"https://doi.org/10.1145/3572960.3572980","url":null,"abstract":"Medical systematic reviews typically require assessing all the documents retrieved by a search. The reason is two-fold: the task aims for “total recall”; and documents retrieved using Boolean search are an unordered set, and thus it is unclear how an assessor could examine only a subset. Screening prioritisation is the process of ranking the (unordered) set of retrieved documents, allowing assessors to begin the downstream processes of the systematic review creation earlier, leading to earlier completion of the review, or even avoiding screening documents ranked least relevant. Screening prioritisation requires highly effective ranking methods. Pre-trained language models are state-of-the-art on many IR tasks but have yet to be applied to systematic review screening prioritisation. In this paper, we apply several pre-trained language models to the systematic review document ranking task, both directly and fine-tuned. An empirical analysis compares how effective neural methods compare to traditional methods for this task. We also investigate different types of document representations for neural methods and their impact on ranking performance. Our results show that BERT-based rankers outperform the current state-of-the-art screening prioritisation methods. However, BERT rankers and existing methods can actually be complementary, and thus, further improvements may be achieved if used in conjunction.","PeriodicalId":106265,"journal":{"name":"Proceedings of the 26th Australasian Document Computing Symposium","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123599928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Investigating Language Use by Polarised Groups on Twitter: A Case Study of the Bushfires 调查推特上两极分化群体的语言使用:以丛林大火为例
Proceedings of the 26th Australasian Document Computing Symposium Pub Date : 2022-12-15 DOI: 10.1145/3572960.3572979
Mehwish Nasim, Naeha Sharif, Pranav Bhandari, Derek Weber, Martin Wood, L. Falzon, Y. Kashima
{"title":"Investigating Language Use by Polarised Groups on Twitter: A Case Study of the Bushfires","authors":"Mehwish Nasim, Naeha Sharif, Pranav Bhandari, Derek Weber, Martin Wood, L. Falzon, Y. Kashima","doi":"10.1145/3572960.3572979","DOIUrl":"https://doi.org/10.1145/3572960.3572979","url":null,"abstract":"Online social media platforms have become an important forum for public discourse, and have often been implicated in exacerbating polarisation in public sphere. Yet the precise mechanisms by which polarisation is driven are not fully understood. The study of linguistic style and features has been shown to be useful in exploring various aspects of online group discussions and, in turn, the processes which could contribute to polarisation. We present a case study around the hashtag #ArsonEmergency, collected from Australian Twittersphere during the unprecedented bushfires of 2019/2020. The dataset consists of two polarised groups and one unaffiliated group. We examine the linguistic style, moral language, and happiness profiles of 1786 users active during this catastrophic event. Our results suggest that polarised groups pushed ‘affective polarisation’ on Twitter while discussing the Australian Bushfires.","PeriodicalId":106265,"journal":{"name":"Proceedings of the 26th Australasian Document Computing Symposium","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116772656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pseudo-Relevance Feedback with Dense Retrievers in Pyserini Pyserini中密集检索器的伪相关反馈
Proceedings of the 26th Australasian Document Computing Symposium Pub Date : 2022-12-15 DOI: 10.1145/3572960.3572982
Hang Li
{"title":"Pseudo-Relevance Feedback with Dense Retrievers in Pyserini","authors":"Hang Li","doi":"10.1145/3572960.3572982","DOIUrl":"https://doi.org/10.1145/3572960.3572982","url":null,"abstract":"Transformer-based Dense Retrievers (DRs) are attracting extensive attention because of their effectiveness paired with high efficiency. In this context, few Pseudo-Relevance Feedback (PRF) methods applied to DRs have emerged. However, the absence of a general framework for performing PRF with DRs has made the empirical evaluation, comparison and reproduction of these methods challenging and time-consuming, especially across different DR models developed by different teams of researchers. To tackle this and speed up research into PRF methods for DRs, we showcase a new PRF framework that we implemented as a feature in Pyserini – an easy-to-use Python Information Retrieval toolkit. In particular, we leverage Pyserini’s DR framework and expand it with a PRF framework that abstracts the PRF process away from the specific DR model used. This new functionality in Pyserini allows to easily experiment with PRF methods across different DR models and datasets. Our framework comes with a number of recently proposed PRF methods built into it. Experiments within our framework show that this new PRF feature improves the effectiveness of the DR models currently available in Pyserini.","PeriodicalId":106265,"journal":{"name":"Proceedings of the 26th Australasian Document Computing Symposium","volume":"254 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134525623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Proceedings of the 26th Australasian Document Computing Symposium 第26届澳洲文献计算研讨会论文集
Proceedings of the 26th Australasian Document Computing Symposium Pub Date : 1900-01-01 DOI: 10.1145/3572960
{"title":"Proceedings of the 26th Australasian Document Computing Symposium","authors":"","doi":"10.1145/3572960","DOIUrl":"https://doi.org/10.1145/3572960","url":null,"abstract":"","PeriodicalId":106265,"journal":{"name":"Proceedings of the 26th Australasian Document Computing Symposium","volume":"149 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133388469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信