Fine-Tuning Textrank for Legal Document Summarization: A Bayesian Optimization Based Approach

Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation Pub Date : 2020-12-16 DOI:10.1145/3441501.3441502

Deepali Jain, M. Borah, A. Biswas

{"title":"Fine-Tuning Textrank for Legal Document Summarization: A Bayesian Optimization Based Approach","authors":"Deepali Jain, M. Borah, A. Biswas","doi":"10.1145/3441501.3441502","DOIUrl":null,"url":null,"abstract":"Automatic text summarization techniques have a very high applicability in the legal domain, due to the complex and lengthy nature of legal documents. Most of the classical text summarization algorithms, which are also used in the legal domain, have certain hyperparameters, which if optimized properly, can further improve these algorithms. The choices of these hyperparameters have a big effect on the performance of such algorithms, yet this step of hyperparameter tuning is often overlooked while applying these algorithms in practice. In this work, a Bayesian Optimization based approach is proposed to optimize one of the classical summarization algorithms, Textrank, over this space of choices, by optimizing a ROUGE score mixture based objective function. The process of fine tuning and further evaluation is performed with the help of a publicly available dataset. From the experimental evaluation, it has been observed that the hyperparameter tuned Textrank is able to outperform baseline one-hot vector based Textrank and word2vec based Textrank models, with respect to ROUGE-1, ROUGE-2 and ROUGE-L metrics. The experimental analysis suggests that if proper hyperparameter tuning is performed, even a simple algorithm like Textrank can also perform significantly in the legal document summarization task.","PeriodicalId":415985,"journal":{"name":"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3441501.3441502","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

Abstract

Automatic text summarization techniques have a very high applicability in the legal domain, due to the complex and lengthy nature of legal documents. Most of the classical text summarization algorithms, which are also used in the legal domain, have certain hyperparameters, which if optimized properly, can further improve these algorithms. The choices of these hyperparameters have a big effect on the performance of such algorithms, yet this step of hyperparameter tuning is often overlooked while applying these algorithms in practice. In this work, a Bayesian Optimization based approach is proposed to optimize one of the classical summarization algorithms, Textrank, over this space of choices, by optimizing a ROUGE score mixture based objective function. The process of fine tuning and further evaluation is performed with the help of a publicly available dataset. From the experimental evaluation, it has been observed that the hyperparameter tuned Textrank is able to outperform baseline one-hot vector based Textrank and word2vec based Textrank models, with respect to ROUGE-1, ROUGE-2 and ROUGE-L metrics. The experimental analysis suggests that if proper hyperparameter tuning is performed, even a simple algorithm like Textrank can also perform significantly in the legal document summarization task.

查看原文本刊更多论文

基于贝叶斯优化的法律文件摘要文本库微调方法

由于法律文件的复杂性和冗长性，自动文本摘要技术在法律领域具有很高的适用性。在法律领域使用的经典文本摘要算法中，大多数都有一定的超参数，如果对这些超参数进行适当的优化，可以进一步改进这些算法。这些超参数的选择对这些算法的性能有很大的影响，但在实际应用这些算法时，这一步的超参数调优往往被忽略。在这项工作中，提出了一种基于贝叶斯优化的方法，通过优化基于ROUGE分数混合的目标函数，在这个选择空间上优化经典的摘要算法之一Textrank。微调和进一步评估的过程是在公开可用数据集的帮助下执行的。从实验评估中可以观察到，相对于ROUGE-1、ROUGE-2和ROUGE-L指标，超参数调优的Textrank能够优于基于基线单热向量的Textrank和基于word2vec的Textrank模型。实验分析表明，如果进行适当的超参数调优，即使是像Textrank这样简单的算法也可以在法律文档摘要任务中表现出色。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation

自引率

0.00%

发文量