Improving Kullback-Leibler based legal document summarization using enhanced text representation

2022 IEEE Silchar Subsection Conference (SILCON) Pub Date : 2022-11-04 DOI:10.1109/SILCON55242.2022.10028887

Deepali Jain, Malaya Dutta Borah, Anupam Biswas

{"title":"Improving Kullback-Leibler based legal document summarization using enhanced text representation","authors":"Deepali Jain, Malaya Dutta Borah, Anupam Biswas","doi":"10.1109/SILCON55242.2022.10028887","DOIUrl":null,"url":null,"abstract":"Text summarization has a very high applicability in legal domain due to their complex nature. There are several classical algorithms that can show promising results on legal documents. One such algorithm is Kullback-Leibler based summarization (KLSumm), where documents and candidate summaries are represented through unigram probability distributions and summary sentences are chosen based on minimizing KL-divergence between documents and candidate summaries. The choice of probability distribution has a great impact on choosing the summary sentences from the document. In this work, two approaches are explored for improving the formation of probability distributions, viz. NgramKLSumm and BertKLSumm. From the experimental results, we find that, NgramKLSumm approach performs better on the BillSum dataset in terms of both ROUGE and BERTScore metrics; whereas BertKLSumm performs better on the FIRE dataset in terms of ROUGE metric. In the case of BillSum dataset, the improvement is around 5-13% for BertKLSumm approach; whereas nearly 10-16% improvement is seen for NgramKLSumm approach in terms of ROUGE metrics. In terms of BertScore, the improvement is around 2% F1-score for BertKLSumm approach, while 6% F1-score improvement is seen in the case of NgramKLSumm approach. In the case of FIRE dataset, 2-5% improvement is seen for the ROUGE metrics while no improvement is seen in the case of BertScore. From these results it is clear that with an enhanced representation of documents and candidate summaries, it is possible to obtain great improvements across different datasets as well as evaluation metrics, thereby improving the baseline KLSumm approach for legal document summarization.","PeriodicalId":183947,"journal":{"name":"2022 IEEE Silchar Subsection Conference (SILCON)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Silchar Subsection Conference (SILCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SILCON55242.2022.10028887","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Text summarization has a very high applicability in legal domain due to their complex nature. There are several classical algorithms that can show promising results on legal documents. One such algorithm is Kullback-Leibler based summarization (KLSumm), where documents and candidate summaries are represented through unigram probability distributions and summary sentences are chosen based on minimizing KL-divergence between documents and candidate summaries. The choice of probability distribution has a great impact on choosing the summary sentences from the document. In this work, two approaches are explored for improving the formation of probability distributions, viz. NgramKLSumm and BertKLSumm. From the experimental results, we find that, NgramKLSumm approach performs better on the BillSum dataset in terms of both ROUGE and BERTScore metrics; whereas BertKLSumm performs better on the FIRE dataset in terms of ROUGE metric. In the case of BillSum dataset, the improvement is around 5-13% for BertKLSumm approach; whereas nearly 10-16% improvement is seen for NgramKLSumm approach in terms of ROUGE metrics. In terms of BertScore, the improvement is around 2% F1-score for BertKLSumm approach, while 6% F1-score improvement is seen in the case of NgramKLSumm approach. In the case of FIRE dataset, 2-5% improvement is seen for the ROUGE metrics while no improvement is seen in the case of BertScore. From these results it is clear that with an enhanced representation of documents and candidate summaries, it is possible to obtain great improvements across different datasets as well as evaluation metrics, thereby improving the baseline KLSumm approach for legal document summarization.

查看原文本刊更多论文

使用增强文本表示改进基于Kullback-Leibler的法律文件摘要

文本摘要由于其复杂性，在法律领域具有很高的适用性。有几种经典算法可以在法律文件上显示有希望的结果。其中一种算法是基于Kullback-Leibler的摘要(KLSumm)，其中通过单图概率分布表示文档和候选摘要，并根据最小化文档和候选摘要之间的KL-divergence来选择摘要句子。概率分布的选择对从文档中选择总结句有很大的影响。在这项工作中，探索了两种改进概率分布形成的方法，即NgramKLSumm和BertKLSumm。从实验结果来看，我们发现NgramKLSumm方法在BillSum数据集上的ROUGE和BERTScore指标都表现更好;而BertKLSumm在FIRE数据集上的ROUGE指标表现更好。在BillSum数据集的情况下，bertklsum方法的改进约为5-13%;而NgramKLSumm方法在ROUGE指标方面的改进接近10-16%。就BertScore而言，BertKLSumm方法的f1分数提高了约2%，而NgramKLSumm方法的f1分数提高了6%。在FIRE数据集的情况下，ROUGE指标有2-5%的改善，而在BertScore的情况下没有任何改善。从这些结果中可以清楚地看出，通过增强文档和候选摘要的表示，可以在不同的数据集和评估指标之间获得巨大的改进，从而改进用于法律文档摘要的基线klsum方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE Silchar Subsection Conference (SILCON)

自引率

0.00%

发文量