Fine-Tuning Textrank for Legal Document Summarization: A Bayesian Optimization Based Approach

Deepali Jain, M. Borah, A. Biswas
{"title":"Fine-Tuning Textrank for Legal Document Summarization: A Bayesian Optimization Based Approach","authors":"Deepali Jain, M. Borah, A. Biswas","doi":"10.1145/3441501.3441502","DOIUrl":null,"url":null,"abstract":"Automatic text summarization techniques have a very high applicability in the legal domain, due to the complex and lengthy nature of legal documents. Most of the classical text summarization algorithms, which are also used in the legal domain, have certain hyperparameters, which if optimized properly, can further improve these algorithms. The choices of these hyperparameters have a big effect on the performance of such algorithms, yet this step of hyperparameter tuning is often overlooked while applying these algorithms in practice. In this work, a Bayesian Optimization based approach is proposed to optimize one of the classical summarization algorithms, Textrank, over this space of choices, by optimizing a ROUGE score mixture based objective function. The process of fine tuning and further evaluation is performed with the help of a publicly available dataset. From the experimental evaluation, it has been observed that the hyperparameter tuned Textrank is able to outperform baseline one-hot vector based Textrank and word2vec based Textrank models, with respect to ROUGE-1, ROUGE-2 and ROUGE-L metrics. The experimental analysis suggests that if proper hyperparameter tuning is performed, even a simple algorithm like Textrank can also perform significantly in the legal document summarization task.","PeriodicalId":415985,"journal":{"name":"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3441501.3441502","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21

Abstract

Automatic text summarization techniques have a very high applicability in the legal domain, due to the complex and lengthy nature of legal documents. Most of the classical text summarization algorithms, which are also used in the legal domain, have certain hyperparameters, which if optimized properly, can further improve these algorithms. The choices of these hyperparameters have a big effect on the performance of such algorithms, yet this step of hyperparameter tuning is often overlooked while applying these algorithms in practice. In this work, a Bayesian Optimization based approach is proposed to optimize one of the classical summarization algorithms, Textrank, over this space of choices, by optimizing a ROUGE score mixture based objective function. The process of fine tuning and further evaluation is performed with the help of a publicly available dataset. From the experimental evaluation, it has been observed that the hyperparameter tuned Textrank is able to outperform baseline one-hot vector based Textrank and word2vec based Textrank models, with respect to ROUGE-1, ROUGE-2 and ROUGE-L metrics. The experimental analysis suggests that if proper hyperparameter tuning is performed, even a simple algorithm like Textrank can also perform significantly in the legal document summarization task.
基于贝叶斯优化的法律文件摘要文本库微调方法
由于法律文件的复杂性和冗长性,自动文本摘要技术在法律领域具有很高的适用性。在法律领域使用的经典文本摘要算法中,大多数都有一定的超参数,如果对这些超参数进行适当的优化,可以进一步改进这些算法。这些超参数的选择对这些算法的性能有很大的影响,但在实际应用这些算法时,这一步的超参数调优往往被忽略。在这项工作中,提出了一种基于贝叶斯优化的方法,通过优化基于ROUGE分数混合的目标函数,在这个选择空间上优化经典的摘要算法之一Textrank。微调和进一步评估的过程是在公开可用数据集的帮助下执行的。从实验评估中可以观察到,相对于ROUGE-1、ROUGE-2和ROUGE-L指标,超参数调优的Textrank能够优于基于基线单热向量的Textrank和基于word2vec的Textrank模型。实验分析表明,如果进行适当的超参数调优,即使是像Textrank这样简单的算法也可以在法律文档摘要任务中表现出色。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信