Rajaa El Hamdani, Thomas Bonald, Fragkiskos Malliaros, Nils Holzenberger, Fabian Suchanek
{"title":"The Factuality of Large Language Models in the Legal Domain","authors":"Rajaa El Hamdani, Thomas Bonald, Fragkiskos Malliaros, Nils Holzenberger, Fabian Suchanek","doi":"arxiv-2409.11798","DOIUrl":null,"url":null,"abstract":"This paper investigates the factuality of large language models (LLMs) as\nknowledge bases in the legal domain, in a realistic usage scenario: we allow\nfor acceptable variations in the answer, and let the model abstain from\nanswering when uncertain. First, we design a dataset of diverse factual\nquestions about case law and legislation. We then use the dataset to evaluate\nseveral LLMs under different evaluation methods, including exact, alias, and\nfuzzy matching. Our results show that the performance improves significantly\nunder the alias and fuzzy matching methods. Further, we explore the impact of\nabstaining and in-context examples, finding that both strategies enhance\nprecision. Finally, we demonstrate that additional pre-training on legal\ndocuments, as seen with SaulLM, further improves factual precision from 63% to\n81%.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11798","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper investigates the factuality of large language models (LLMs) as
knowledge bases in the legal domain, in a realistic usage scenario: we allow
for acceptable variations in the answer, and let the model abstain from
answering when uncertain. First, we design a dataset of diverse factual
questions about case law and legislation. We then use the dataset to evaluate
several LLMs under different evaluation methods, including exact, alias, and
fuzzy matching. Our results show that the performance improves significantly
under the alias and fuzzy matching methods. Further, we explore the impact of
abstaining and in-context examples, finding that both strategies enhance
precision. Finally, we demonstrate that additional pre-training on legal
documents, as seen with SaulLM, further improves factual precision from 63% to
81%.