Jiyoung Yoon, Muhammad Junaid, Sajid Ali, Jongwuk Lee
{"title":"基于预训练语言模型的韩国法律案例摘要","authors":"Jiyoung Yoon, Muhammad Junaid, Sajid Ali, Jongwuk Lee","doi":"10.1109/IMCOM53663.2022.9721808","DOIUrl":null,"url":null,"abstract":"AI technology in the legal domain has developed at a rapid pace around the world, but not much research is being conducted in the Korean legal field due to barriers of language and the high level of expertise required. We first attempt abstractive summarization of Korean legal decision text and publicly release our collected dataset. We utilize two pretrained language models, i.e., BERT2BERT and BART, for our task. They are based on the encoder-decoder approach under transformer architecture. While BERT2BERT is pre-trained with BERT on both the encoder and decoder, BART combines BERT and GPT as the encoder and the decoder. We then evaluate the baseline models and show that, despite the difference in language style, the high-quality summary was generated using applied models. We also show that pre-training using both autoencoder and autoregressive method makes better performance than using solely denoising autoencoder.","PeriodicalId":367038,"journal":{"name":"2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Abstractive Summarization of Korean Legal Cases using Pre-trained Language Models\",\"authors\":\"Jiyoung Yoon, Muhammad Junaid, Sajid Ali, Jongwuk Lee\",\"doi\":\"10.1109/IMCOM53663.2022.9721808\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"AI technology in the legal domain has developed at a rapid pace around the world, but not much research is being conducted in the Korean legal field due to barriers of language and the high level of expertise required. We first attempt abstractive summarization of Korean legal decision text and publicly release our collected dataset. We utilize two pretrained language models, i.e., BERT2BERT and BART, for our task. They are based on the encoder-decoder approach under transformer architecture. While BERT2BERT is pre-trained with BERT on both the encoder and decoder, BART combines BERT and GPT as the encoder and the decoder. We then evaluate the baseline models and show that, despite the difference in language style, the high-quality summary was generated using applied models. We also show that pre-training using both autoencoder and autoregressive method makes better performance than using solely denoising autoencoder.\",\"PeriodicalId\":367038,\"journal\":{\"name\":\"2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IMCOM53663.2022.9721808\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMCOM53663.2022.9721808","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Abstractive Summarization of Korean Legal Cases using Pre-trained Language Models
AI technology in the legal domain has developed at a rapid pace around the world, but not much research is being conducted in the Korean legal field due to barriers of language and the high level of expertise required. We first attempt abstractive summarization of Korean legal decision text and publicly release our collected dataset. We utilize two pretrained language models, i.e., BERT2BERT and BART, for our task. They are based on the encoder-decoder approach under transformer architecture. While BERT2BERT is pre-trained with BERT on both the encoder and decoder, BART combines BERT and GPT as the encoder and the decoder. We then evaluate the baseline models and show that, despite the difference in language style, the high-quality summary was generated using applied models. We also show that pre-training using both autoencoder and autoregressive method makes better performance than using solely denoising autoencoder.