{"title":"Improving Kullback-Leibler based legal document summarization using enhanced text representation","authors":"Deepali Jain, Malaya Dutta Borah, Anupam Biswas","doi":"10.1109/SILCON55242.2022.10028887","DOIUrl":null,"url":null,"abstract":"Text summarization has a very high applicability in legal domain due to their complex nature. There are several classical algorithms that can show promising results on legal documents. One such algorithm is Kullback-Leibler based summarization (KLSumm), where documents and candidate summaries are represented through unigram probability distributions and summary sentences are chosen based on minimizing KL-divergence between documents and candidate summaries. The choice of probability distribution has a great impact on choosing the summary sentences from the document. In this work, two approaches are explored for improving the formation of probability distributions, viz. NgramKLSumm and BertKLSumm. From the experimental results, we find that, NgramKLSumm approach performs better on the BillSum dataset in terms of both ROUGE and BERTScore metrics; whereas BertKLSumm performs better on the FIRE dataset in terms of ROUGE metric. In the case of BillSum dataset, the improvement is around 5-13% for BertKLSumm approach; whereas nearly 10-16% improvement is seen for NgramKLSumm approach in terms of ROUGE metrics. In terms of BertScore, the improvement is around 2% F1-score for BertKLSumm approach, while 6% F1-score improvement is seen in the case of NgramKLSumm approach. In the case of FIRE dataset, 2-5% improvement is seen for the ROUGE metrics while no improvement is seen in the case of BertScore. From these results it is clear that with an enhanced representation of documents and candidate summaries, it is possible to obtain great improvements across different datasets as well as evaluation metrics, thereby improving the baseline KLSumm approach for legal document summarization.","PeriodicalId":183947,"journal":{"name":"2022 IEEE Silchar Subsection Conference (SILCON)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Silchar Subsection Conference (SILCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SILCON55242.2022.10028887","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Text summarization has a very high applicability in legal domain due to their complex nature. There are several classical algorithms that can show promising results on legal documents. One such algorithm is Kullback-Leibler based summarization (KLSumm), where documents and candidate summaries are represented through unigram probability distributions and summary sentences are chosen based on minimizing KL-divergence between documents and candidate summaries. The choice of probability distribution has a great impact on choosing the summary sentences from the document. In this work, two approaches are explored for improving the formation of probability distributions, viz. NgramKLSumm and BertKLSumm. From the experimental results, we find that, NgramKLSumm approach performs better on the BillSum dataset in terms of both ROUGE and BERTScore metrics; whereas BertKLSumm performs better on the FIRE dataset in terms of ROUGE metric. In the case of BillSum dataset, the improvement is around 5-13% for BertKLSumm approach; whereas nearly 10-16% improvement is seen for NgramKLSumm approach in terms of ROUGE metrics. In terms of BertScore, the improvement is around 2% F1-score for BertKLSumm approach, while 6% F1-score improvement is seen in the case of NgramKLSumm approach. In the case of FIRE dataset, 2-5% improvement is seen for the ROUGE metrics while no improvement is seen in the case of BertScore. From these results it is clear that with an enhanced representation of documents and candidate summaries, it is possible to obtain great improvements across different datasets as well as evaluation metrics, thereby improving the baseline KLSumm approach for legal document summarization.