{"title":"Fine-Tuning Textrank for Legal Document Summarization: A Bayesian Optimization Based Approach","authors":"Deepali Jain, M. Borah, A. Biswas","doi":"10.1145/3441501.3441502","DOIUrl":null,"url":null,"abstract":"Automatic text summarization techniques have a very high applicability in the legal domain, due to the complex and lengthy nature of legal documents. Most of the classical text summarization algorithms, which are also used in the legal domain, have certain hyperparameters, which if optimized properly, can further improve these algorithms. The choices of these hyperparameters have a big effect on the performance of such algorithms, yet this step of hyperparameter tuning is often overlooked while applying these algorithms in practice. In this work, a Bayesian Optimization based approach is proposed to optimize one of the classical summarization algorithms, Textrank, over this space of choices, by optimizing a ROUGE score mixture based objective function. The process of fine tuning and further evaluation is performed with the help of a publicly available dataset. From the experimental evaluation, it has been observed that the hyperparameter tuned Textrank is able to outperform baseline one-hot vector based Textrank and word2vec based Textrank models, with respect to ROUGE-1, ROUGE-2 and ROUGE-L metrics. The experimental analysis suggests that if proper hyperparameter tuning is performed, even a simple algorithm like Textrank can also perform significantly in the legal document summarization task.","PeriodicalId":415985,"journal":{"name":"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3441501.3441502","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21
Abstract
Automatic text summarization techniques have a very high applicability in the legal domain, due to the complex and lengthy nature of legal documents. Most of the classical text summarization algorithms, which are also used in the legal domain, have certain hyperparameters, which if optimized properly, can further improve these algorithms. The choices of these hyperparameters have a big effect on the performance of such algorithms, yet this step of hyperparameter tuning is often overlooked while applying these algorithms in practice. In this work, a Bayesian Optimization based approach is proposed to optimize one of the classical summarization algorithms, Textrank, over this space of choices, by optimizing a ROUGE score mixture based objective function. The process of fine tuning and further evaluation is performed with the help of a publicly available dataset. From the experimental evaluation, it has been observed that the hyperparameter tuned Textrank is able to outperform baseline one-hot vector based Textrank and word2vec based Textrank models, with respect to ROUGE-1, ROUGE-2 and ROUGE-L metrics. The experimental analysis suggests that if proper hyperparameter tuning is performed, even a simple algorithm like Textrank can also perform significantly in the legal document summarization task.